Use this resource to configure VPC & subnets for new workspaces within AWS. It is essential to understand that this will require you to configure your provider separately for the multiple workspaces resources.
Please follow this complete runnable example with new VPC and new workspace setup. Please pay special attention to the fact that there you have two different instances of a databricks provider - one for deploying workspaces (with host="https://accounts.cloud.databricks.com/"
) and another for the workspace you've created with databricks_mws_workspaces
resource. If you want both creations of workspaces & clusters within the same Terraform module (essentially the same directory), you should use the provider aliasing feature of Terraform. We strongly recommend having one terraform module to create workspace + PAT token and the rest in different modules.
Use this resource to configure VPC & subnet for new workspaces within GCP. It is essential to understand that this will require you to configure your provider separately for the multiple workspaces resources.
Please follow this complete runnable example with new VPC and new workspace setup. Please pay special attention to the fact that there you have two different instances of a databricks provider - one for deploying workspaces (with host="https://accounts.gcp.databricks.com/"
) and another for the workspace you've created with databricks_mws_workspaces
resource. If you want both creations of workspaces & clusters within the same Terraform module (essentially the same directory), you should use the provider aliasing feature of Terraform. We strongly recommend having one terraform module to create workspace + PAT token and the rest in different modules.
variable "databricks_account_id" {
description = "Account Id that could be found in the top right corner of https://accounts.cloud.databricks.com/"
}
data "aws_availability_zones" "available" {}
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "2.70.0"
name = local.prefix
cidr = var.cidr_block
secondary_cidr_blocks = [var.cidr_block_public]
azs = data.aws_availability_zones.available.names
tags = var.tags
enable_dns_hostnames = true
enable_nat_gateway = true
create_igw = true
public_subnets = [cidrsubnet(var.cidr_block_public, 6, 0)]
private_subnets = [cidrsubnet(var.cidr_block, 3, 1),
cidrsubnet(var.cidr_block, 3, 2)]
default_security_group_egress = [{
cidr_blocks = "0.0.0.0/0"
}]
default_security_group_ingress = [{
description = "Allow all internal TCP and UDP"
self = true
}]
}
resource "databricks_mws_networks" "this" {
provider = databricks.mws
account_id = var.databricks_account_id
network_name = "${local.prefix}-network"
security_group_ids = [module.vpc.default_security_group_id]
subnet_ids = module.vpc.private_subnets
vpc_id = module.vpc.vpc_id
}
In order to create a VPC that leverages AWS PrivateLink you would need to add the vpc_endpoint_id
Attributes from mws_vpc_endpoint resources into the databricks_mws_networks resource. For example:
resource "databricks_mws_networks" "this" {
provider = databricks.mws
account_id = var.databricks_account_id
network_name = "${local.prefix}-network"
security_group_ids = [module.vpc.default_security_group_id]
subnet_ids = module.vpc.private_subnets
vpc_id = module.vpc.vpc_id
vpc_endpoints {
dataplane_relay = [databricks_mws_vpc_endpoint.relay.vpc_endpoint_id]
rest_api = [databricks_mws_vpc_endpoint.workspace.vpc_endpoint_id]
}
depends_on = [aws_vpc_endpoint.workspace, aws_vpc_endpoint.relay]
}
variable "databricks_account_id" {
description = "Account Id that could be found in the top right corner of https://accounts.cloud.databricks.com/"
}
resource "google_compute_network" "dbx_private_vpc" {
project = var.google_project
name = "tf-network-${random_string.suffix.result}"
auto_create_subnetworks = false
}
resource "google_compute_subnetwork" "network-with-private-secondary-ip-ranges" {
name = "test-dbx-${random_string.suffix.result}"
ip_cidr_range = "10.0.0.0/16"
region = "us-central1"
network = google_compute_network.dbx_private_vpc.id
secondary_ip_range {
range_name = "pods"
ip_cidr_range = "10.1.0.0/16"
}
secondary_ip_range {
range_name = "svc"
ip_cidr_range = "10.2.0.0/20"
}
private_ip_google_access = true
}
resource "google_compute_router" "router" {
name = "my-router-${random_string.suffix.result}"
region = google_compute_subnetwork.network-with-private-secondary-ip-ranges.region
network = google_compute_network.dbx_private_vpc.id
}
resource "google_compute_router_nat" "nat" {
name = "my-router-nat-${random_string.suffix.result}"
router = google_compute_router.router.name
region = google_compute_router.router.region
nat_ip_allocate_option = "AUTO_ONLY"
source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"
}
resource "databricks_mws_networks" "this" {
account_id = var.databricks_account_id
network_name = "test-demo-${random_string.suffix.result}"
gcp_network_info {
network_project_id = var.google_project
vpc_id = google_compute_network.dbx_private_vpc.name
subnet_id = google_compute_subnetwork.network_with_private_secondary_ip_ranges.name
subnet_region = google_compute_subnetwork.network_with_private_secondary_ip_ranges.region
pod_ip_range_name = "pods"
service_ip_range_name = "svc"
}
}
In order to create a VPC that leverages GCP Private Service Connect you would need to add the vpc_endpoint_id
Attributes from mws_vpc_endpoint resources into the databricks_mws_networks resource. For example:
resource "databricks_mws_networks" "this" {
account_id = var.databricks_account_id
network_name = "test-demo-${random_string.suffix.result}"
gcp_network_info {
network_project_id = var.google_project
vpc_id = google_compute_network.dbx_private_vpc.name
subnet_id = google_compute_subnetwork.network_with_private_secondary_ip_ranges.name
subnet_region = google_compute_subnetwork.network_with_private_secondary_ip_ranges.region
pod_ip_range_name = "pods"
service_ip_range_name = "svc"
}
vpc_endpoints {
dataplane_relay = [databricks_mws_vpc_endpoint.relay.vpc_endpoint_id]
rest_api = [databricks_mws_vpc_endpoint.workspace.vpc_endpoint_id]
}
}
Due to specifics of platform APIs, changing any attribute of network configuration would cause databricks_mws_networks
to be re-created - deleted & added again with special case for running workspaces. Once network configuration is attached to a running databricks_mws_workspaces, you cannot delete it and terraform apply
would result in INVALID_STATE: Unable to delete, Network is being used by active workspace X
error. In order to modify any attributes of a network, you have to perform three different terraform apply
steps:
databricks_mws_networks
resource.databricks_mws_workspaces
to point to the new network_id
.databricks_mws_networks
resource.The following arguments are available:
account_id
- Account Id that could be found in the top right corner of Accounts Consolenetwork_name
- name under which this network is registeredvpc_id
- (AWS only) aws_vpc idsubnet_ids
- (AWS only) ids of aws_subnetsecurity_group_ids
- (AWS only) ids of aws_security_groupvpc_endpoints
- (Optional) mapping of databricks_mws_vpc_endpoint for PrivateLink or Private Service Connect connectionsgcp_network_info
- (GCP only) a block consists of Google Cloud specific information for this network, for example the VPC ID, subnet ID, and secondary IP ranges. It has the following fields:
network_project_id
- The Google Cloud project ID of the VPC network.vpc_id
- The ID of the VPC associated with this network. VPC IDs can be used in multiple network configurations.subnet_id
- The ID of the subnet associated with this network.subnet_region
- The Google Cloud region of the workspace data plane. For example, us-east4
.pod_ip_range_name
- The name of the secondary IP range for pods. A Databricks-managed GKE cluster uses this IP range for its pods. This secondary IP range can only be used by one workspace.service_ip_range_name
- The name of the secondary IP range for services. A Databricks-managed GKE cluster uses this IP range for its services. This secondary IP range can only be used by one workspace.In addition to all arguments above, the following attributes are exported:
id
- Canonical unique identifier for the mws networks.network_id
- (String) id of network to be used for databricks_mws_workspaces resource.vpc_status
- (String) VPC attachment statusworkspace_id
- (Integer) id of associated workspaceThe following resources are used in the same context: