provider "databricks" {
host = module.ai.databricks_host
token = module.ai.databricks_token
}
This resource allows you to set up workspaces in E2 architecture on AWS or workspaces on GCP. Please follow this complete runnable example on AWS or GCP with new VPC and new workspace setup.
To get workspace running, you have to configure a couple of things:
variable "databricks_account_id" {
description = "Account ID that can be found in the dropdown under the email address in the upper-right corner of https://accounts.cloud.databricks.com/"
}
provider "databricks" {
alias = "mws"
host = "https://accounts.cloud.databricks.com"
}
// register cross-account ARN
resource "databricks_mws_credentials" "this" {
provider = databricks.mws
account_id = var.databricks_account_id
credentials_name = "${var.prefix}-creds"
role_arn = var.crossaccount_arn
}
// register root bucket
resource "databricks_mws_storage_configurations" "this" {
provider = databricks.mws
account_id = var.databricks_account_id
storage_configuration_name = "${var.prefix}-storage"
bucket_name = var.root_bucket
}
// register VPC
resource "databricks_mws_networks" "this" {
provider = databricks.mws
account_id = var.databricks_account_id
network_name = "${var.prefix}-network"
vpc_id = var.vpc_id
subnet_ids = var.subnets_private
security_group_ids = [var.security_group]
}
// create workspace in given VPC with DBFS on root bucket
resource "databricks_mws_workspaces" "this" {
provider = databricks.mws
account_id = var.databricks_account_id
workspace_name = var.prefix
aws_region = var.region
credentials_id = databricks_mws_credentials.this.credentials_id
storage_configuration_id = databricks_mws_storage_configurations.this.storage_configuration_id
network_id = databricks_mws_networks.this.network_id
token {}
}
output "databricks_token" {
value = databricks_mws_workspaces.this.token[0].token_value
sensitive = true
}
By default, Databricks creates a VPC in your AWS account for each workspace. Databricks uses it for running clusters in the workspace. Optionally, you can use your VPC for the workspace, using the feature customer-managed VPC. Databricks recommends that you provide your VPC with databricks_mws_networks so that you can configure it according to your organization’s enterprise cloud standards while still conforming to Databricks requirements. You cannot migrate an existing workspace to your VPC. Please see the difference described through IAM policy actions on this page.
variable "databricks_account_id" {
description = "Account Id that could be found in the top right corner of https://accounts.cloud.databricks.com/"
}
resource "random_string" "naming" {
special = false
upper = false
length = 6
}
locals {
prefix = "dltp${random_string.naming.result}"
}
data "databricks_aws_assume_role_policy" "this" {
external_id = var.databricks_account_id
}
resource "aws_iam_role" "cross_account_role" {
name = "${local.prefix}-crossaccount"
assume_role_policy = data.databricks_aws_assume_role_policy.this.json
tags = var.tags
}
data "databricks_aws_crossaccount_policy" "this" {
}
resource "aws_iam_role_policy" "this" {
name = "${local.prefix}-policy"
role = aws_iam_role.cross_account_role.id
policy = data.databricks_aws_crossaccount_policy.this.json
}
resource "databricks_mws_credentials" "this" {
provider = databricks.mws
account_id = var.databricks_account_id
credentials_name = "${local.prefix}-creds"
role_arn = aws_iam_role.cross_account_role.arn
}
resource "aws_s3_bucket" "root_storage_bucket" {
bucket = "${local.prefix}-rootbucket"
acl = "private"
force_destroy = true
tags = var.tags
}
resource "aws_s3_bucket_versioning" "root_versioning" {
bucket = aws_s3_bucket.root_storage_bucket.id
versioning_configuration {
status = "Disabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "root_storage_bucket" {
bucket = aws_s3_bucket.root_storage_bucket.bucket
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
resource "aws_s3_bucket_public_access_block" "root_storage_bucket" {
bucket = aws_s3_bucket.root_storage_bucket.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
depends_on = [aws_s3_bucket.root_storage_bucket]
}
data "databricks_aws_bucket_policy" "this" {
bucket = aws_s3_bucket.root_storage_bucket.bucket
}
resource "aws_s3_bucket_policy" "root_bucket_policy" {
bucket = aws_s3_bucket.root_storage_bucket.id
policy = data.databricks_aws_bucket_policy.this.json
depends_on = [aws_s3_bucket_public_access_block.root_storage_bucket]
}
resource "databricks_mws_storage_configurations" "this" {
provider = databricks.mws
account_id = var.databricks_account_id
storage_configuration_name = "${local.prefix}-storage"
bucket_name = aws_s3_bucket.root_storage_bucket.bucket
}
resource "databricks_mws_workspaces" "this" {
provider = databricks.mws
account_id = var.databricks_account_id
workspace_name = local.prefix
aws_region = "us-east-1"
credentials_id = databricks_mws_credentials.this.credentials_id
storage_configuration_id = databricks_mws_storage_configurations.this.storage_configuration_id
token {}
# Optional Custom Tags
custom_tags = {
"SoldToCode" = "1234"
}
}
output "databricks_token" {
value = databricks_mws_workspaces.this.token[0].token_value
sensitive = true
}
In order to create a Databricks Workspace that leverages AWS PrivateLink please ensure that you have read and understood the Enable Private Link documentation and then customise the example above with the relevant examples from mws_vpc_endpoint, mws_private_access_settings and mws_networks.
To get workspace running, you have to configure a network object:
variable "databricks_account_id" {
description = "Account Id that could be found in the top right corner of https://accounts.cloud.databricks.com/"
}
variable "databricks_google_service_account" {}
variable "google_project" {}
provider "databricks" {
alias = "mws"
host = "https://accounts.gcp.databricks.com"
}
// register VPC
resource "databricks_mws_networks" "this" {
account_id = var.databricks_account_id
network_name = "${var.prefix}-network"
gcp_network_info {
network_project_id = var.google_project
vpc_id = var.vpc_id
subnet_id = var.subnet_id
subnet_region = var.subnet_region
pod_ip_range_name = "pods"
service_ip_range_name = "svc"
}
}
// create workspace in given VPC
resource "databricks_mws_workspaces" "this" {
account_id = var.databricks_account_id
workspace_name = var.prefix
location = var.subnet_region
cloud_resource_container {
gcp {
project_id = var.google_project
}
}
network_id = databricks_mws_networks.this.network_id
gke_config {
connectivity_type = "PRIVATE_NODE_PUBLIC_MASTER"
master_ip_range = "10.3.0.0/28"
}
token {}
}
output "databricks_token" {
value = databricks_mws_workspaces.this.token[0].token_value
sensitive = true
}
In order to create a Databricks Workspace that leverages GCP Private Service Connect please ensure that you have read and understood the Enable Private Service Connect documentation and then customise the example above with the relevant examples from mws_vpc_endpoint, mws_private_access_settings and mws_networks.
By default, Databricks creates a VPC in your GCP project for each workspace. Databricks uses it for running clusters in the workspace. Optionally, you can use your VPC for the workspace, using the feature customer-managed VPC. Databricks recommends that you provide your VPC with databricks_mws_networks so that you can configure it according to your organization’s enterprise cloud standards while still conforming to Databricks requirements. You cannot migrate an existing workspace to your VPC.
variable "databricks_account_id" {
description = "Account Id that could be found in the top right corner of https://accounts.cloud.databricks.com/"
}
data "google_client_openid_userinfo" "me" {
}
data "google_client_config" "current" {
}
resource "databricks_mws_workspaces" "this" {
provider = databricks.accounts
account_id = var.databricks_account_id
workspace_name = var.prefix
location = data.google_client_config.current.region
cloud_resource_container {
gcp {
project_id = data.google_client_config.current.project
}
}
gke_config {
connectivity_type = "PRIVATE_NODE_PUBLIC_MASTER"
master_ip_range = "10.3.0.0/28"
}
token {}
}
output "databricks_token" {
value = databricks_mws_workspaces.this.token[0].token_value
sensitive = true
}
The following arguments are available:
account_id
- Account Id that could be found in the top right corner of Accounts Console.deployment_name
- (Optional) part of URL as in https://<prefix>-<deployment-name>.cloud.databricks.com
. Deployment name cannot be used until a deployment name prefix is defined. Please contact your Databricks representative. Once a new deployment prefix is added/updated, it only will affect the new workspaces created.workspace_name
- name of the workspace, will appear on UI.network_id
- (Optional) network_id
from networks.aws_region
- (AWS only) region of VPC.storage_configuration_id
- (AWS only)storage_configuration_id
from storage configuration.managed_services_customer_managed_key_id
- (Optional) customer_managed_key_id
from customer managed keys with use_cases
set to MANAGED_SERVICES
. This is used to encrypt the workspace's notebook and secret data in the control plane.storage_customer_managed_key_id
- (Optional) customer_managed_key_id
from customer managed keys with use_cases
set to STORAGE
. This is used to encrypt the DBFS Storage & Cluster Volumes.location
- (GCP only) region of the subnet.cloud_resource_container
- (GCP only) A block that specifies GCP workspace configurations, consisting of following blocks:
gcp
- A block that consists of the following field:project_id
- The Google Cloud project ID, which the workspace uses to instantiate cloud resources for your workspace.gke_config
- (GCP only) A block that specifies GKE configuration for the Databricks workspace:
connectivity_type
: Specifies the network connectivity types for the GKE nodes and the GKE master network. Possible values are: PRIVATE_NODE_PUBLIC_MASTER
, PUBLIC_NODE_PUBLIC_MASTER
.master_ip_range
: The IP range from which to allocate GKE cluster master resources. This field will be ignored if GKE private cluster is not enabled. It must be exactly as big as /28
.private_access_settings_id
- (Optional) Canonical unique identifier of databricks_mws_private_access_settings in Databricks Account.custom_tags
- (Optional / AWS only) - The custom tags key-value pairing that is attached to this workspace. These tags will be applied to clusters automatically in addition to any default_tags
or custom_tags
on a cluster level. Please note it can take up to an hour for custom_tags to be set due to scheduling on Control Plane. After custom tags are applied, they can be modified however they can never be completely removed.pricing_tier
- (Optional) - The pricing tier of the workspace.You can specify a token
block in the body of the workspace resource, so that Terraform manages the refresh of the PAT token for the deployment user. The other option is to create databricks_obo_token, though it requires Premium or Enterprise plan enabled as well as more complex setup. Token block exposes token_value
, that holds sensitive PAT token and optionally it can accept two arguments:
comment
- (Optional) Comment, that will appear in "User Settings / Access Tokens" page on Workspace UI. By default it's "Terraform PAT".lifetime_seconds
- (Optional) Token expiry lifetime. By default its 2592000 (30 days).On AWS, the following arguments could be modified after the workspace is running:
network_id
- Modifying networks on running workspaces would require three separate terraform apply
steps.credentials_id
storage_customer_managed_key_id
private_access_settings_id
custom_tags
In addition to all arguments above, the following attributes are exported:
id
- (String) Canonical unique identifier for the workspace, of the format <account-id>/<workspace-id>
workspace_id
- (String) workspace idworkspace_status_message
- (String) updates on workspace statusworkspace_status
- (String) workspace statuscreation_time
- (Integer) time when workspace was createdworkspace_url
- (String) URL of the workspacecustom_tags
- (Map) Custom Tags (if present) added to workspaceThe timeouts
block allows you to specify create
, read
and update
timeouts. It usually takes 5-7 minutes to provision Databricks E2 Workspace and another couple of minutes for your local DNS caches to resolve. Please launch TF_LOG=DEBUG terraform apply
whenever you observe timeout issues.
timeouts {
create = "30m"
read = "10m"
update = "20m"
}
You can reset local DNS caches before provisioning new workspaces with one of the following commands:
sudo /etc/init.d/nscd restart
sudo killall -HUP mDNSResponder
sudo discoveryutil udnsflushcaches
sudo dscacheutil -flushcache
sudo lookupd -flushcache
The following resources are used in the same context: