databricks_mount Resource

This resource will mount your cloud storage on dbfs:/mnt/name. Right now it supports mounting AWS S3, Azure (Blob Storage, ADLS Gen1 & Gen2), Google Cloud Storage. It is important to understand that this will start up the cluster if the cluster is terminated. The read and refresh terraform command will require a cluster and may take some time to validate the mount.

Note When cluster_id is not specified, it will create the smallest possible cluster in the default availability zone with name equal to or starting with terraform-mount for the shortest possible amount of time. To avoid mount failure due to potentially quota or capacity issues with the default cluster, we recommend specifying a cluster to use for mounting.

Note CRUD operations on a databricks mount require a running cluster. Due to limitations of terraform and the databricks mounts APIs, if the cluster the mount was most recently created / updated using no longer exists AND the mount is destroyed as a part of a terraform apply, we mark it as deleted without cleaning it up from the workspace.

This resource provides two ways of mounting a storage account:

  1. Use a storage-specific configuration block - this could be used for the most cases, as it will fill most of the necessary details. Currently we support following configuration blocks:
  1. Use generic arguments - you have a responsibility for providing all necessary parameters that are required to mount specific storage. This is most flexible option

Common arguments

Example mounting ADLS Gen2 using uri and extra_configs

locals {
  tenant_id    = "00000000-1111-2222-3333-444444444444"
  client_id    = "55555555-6666-7777-8888-999999999999"
  secret_scope = "some-kv"
  secret_key   = "some-sp-secret"
  container    = "test"
  storage_acc  = "lrs"
}

resource "databricks_mount" "this" {
  name = "tf-abfss"

  uri = "abfss://${local.container}@${local.storage_acc}.dfs.core.windows.net"
  extra_configs = {
    "fs.azure.account.auth.type" : "OAuth",
    "fs.azure.account.oauth.provider.type" : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
    "fs.azure.account.oauth2.client.id" : local.client_id,
    "fs.azure.account.oauth2.client.secret" : "{{secrets/${local.secret_scope}/${local.secret_key}}}",
    "fs.azure.account.oauth2.client.endpoint" : "https://login.microsoftonline.com/${local.tenant_id}/oauth2/token",
    "fs.azure.createRemoteFileSystemDuringInitialization" : "false",
  }
}

Example mounting ADLS Gen2 with AAD passthrough

To mount ALDS Gen2 with Azure Active Directory Credentials passthrough we need to execute the mount commands using the cluster configured with AAD Credentials passthrough & provide necessary configuration parameters (see documentation for more details).

provider "azurerm" {
  features {}
}

variable "resource_group" {
  type        = string
  description = "Resource group for Databricks Workspace"
}

variable "workspace_name" {
  type        = string
  description = "Name of the Databricks Workspace"
}

data "azurerm_databricks_workspace" "this" {
  name                = var.workspace_name
  resource_group_name = var.resource_group
}

# it works only with AAD token!
provider "databricks" {
  host = data.azurerm_databricks_workspace.this.workspace_url
}

data "databricks_node_type" "smallest" {
  local_disk = true
}

data "databricks_spark_version" "latest" {
}

resource "databricks_cluster" "shared_passthrough" {
  cluster_name            = "Shared Passthrough for mount"
  spark_version           = data.databricks_spark_version.latest.id
  node_type_id            = data.databricks_node_type.smallest.id
  autotermination_minutes = 10
  num_workers             = 1

  spark_conf = {
    "spark.databricks.cluster.profile" : "serverless",
    "spark.databricks.repl.allowedLanguages" : "python,sql",
    "spark.databricks.passthrough.enabled" : "true",
    "spark.databricks.pyspark.enableProcessIsolation" : "true"
  }

  custom_tags = {
    "ResourceClass" : "Serverless"
  }
}

variable "storage_acc" {
  type        = string
  description = "Name of the ADLS Gen2 storage container"
}

variable "container" {
  type        = string
  description = "Name of container inside storage account"
}

resource "databricks_mount" "passthrough" {
  name       = "passthrough-test"
  cluster_id = databricks_cluster.shared_passthrough.id

  uri = "abfss://${var.container}@${var.storage_acc}.dfs.core.windows.net"
  extra_configs = {
    "fs.azure.account.auth.type" : "CustomAccessToken",
    "fs.azure.account.custom.token.provider.class" : "{{sparkconf/spark.databricks.passthrough.adls.gen2.tokenProviderClassName}}",
  }
}

s3 block

This block allows specifying parameters for mounting of the ADLS Gen2. The following arguments are required inside the s3 block:

Example of mounting S3

// now you can do `%fs ls /mnt/experiments` in notebooks
resource "databricks_mount" "this" {
  name = "experiments"
  s3 {
    instance_profile = databricks_instance_profile.ds.id
    bucket_name      = aws_s3_bucket.this.bucket
  }
}

abfs block

This block allows specifying parameters for mounting of the ADLS Gen2. The following arguments are required inside the abfs block:

Creating mount for ADLS Gen2 using abfs block

In this example, we're using Azure authentication, so we can omit some parameters (tenant_id, storage_account_name, and container_name) that will be detected automatically.

resource "databricks_secret_scope" "terraform" {
  name                     = "application"
  initial_manage_principal = "users"
}

resource "databricks_secret" "service_principal_key" {
  key          = "service_principal_key"
  string_value = "${var.ARM_CLIENT_SECRET}"
  scope        = databricks_secret_scope.terraform.name
}

resource "azurerm_storage_account" "this" {
  name                     = "${var.prefix}datalake"
  resource_group_name      = var.resource_group_name
  location                 = var.resource_group_location
  account_tier             = "Standard"
  account_replication_type = "GRS"
  account_kind             = "StorageV2"
  is_hns_enabled           = true
}

resource "azurerm_role_assignment" "this" {
  scope                = azurerm_storage_account.this.id
  role_definition_name = "Storage Blob Data Contributor"
  principal_id         = data.azurerm_client_config.current.object_id
}

resource "azurerm_storage_container" "this" {
  name                  = "marketing"
  storage_account_name  = azurerm_storage_account.this.name
  container_access_type = "private"
}

resource "databricks_mount" "marketing" {
  name        = "marketing"
  resource_id = azurerm_storage_container.this.resource_manager_id
  abfs {
    client_id              = data.azurerm_client_config.current.client_id
    client_secret_scope    = databricks_secret_scope.terraform.name
    client_secret_key      = databricks_secret.service_principal_key.key
    initialize_file_system = true
  }
}

gs block

This block allows specifying parameters for mounting of the Google Cloud Storage. The following arguments are required inside the gs block:

Example mounting Google Cloud Storage

resource "databricks_mount" "this_gs" {
  name = "gs-mount"
  gs {
    service_account = "acc@company.iam.gserviceaccount.com"
    bucket_name     = "mybucket"
  }
}

adl block

This block allows specifying parameters for mounting of the ADLS Gen1. The following arguments are required inside the adl block:

Example mounting ADLS Gen1

resource "databricks_mount" "mount" {
  name = "{var.RANDOM}"
  adl {
    storage_resource_name = "{env.TEST_STORAGE_ACCOUNT_NAME}"
    tenant_id             = data.azurerm_client_config.current.tenant_id
    client_id             = data.azurerm_client_config.current.client_id
    client_secret_scope   = databricks_secret_scope.terraform.name
    client_secret_key     = databricks_secret.service_principal_key.key
    spark_conf_prefix     = "fs.adl"
  }
}

wasb block

This block allows specifying parameters for mounting of the Azure Blob Storage. The following arguments are required inside the wasb block:

Example mounting Azure Blob Storage

resource "azurerm_storage_account" "blobaccount" {
  name                     = "${var.prefix}blob"
  resource_group_name      = var.resource_group_name
  location                 = var.resource_group_location
  account_tier             = "Standard"
  account_replication_type = "LRS"
  account_kind             = "StorageV2"
}

resource "azurerm_storage_container" "marketing" {
  name                  = "marketing"
  storage_account_name  = azurerm_storage_account.blobaccount.name
  container_access_type = "private"
}

resource "databricks_secret_scope" "terraform" {
  name                     = "application"
  initial_manage_principal = "users"
}

resource "databricks_secret" "storage_key" {
  key          = "blob_storage_key"
  string_value = azurerm_storage_account.blobaccount.primary_access_key
  scope        = databricks_secret_scope.terraform.name
}

resource "databricks_mount" "marketing" {
  name = "marketing"
  wasb {
    container_name       = azurerm_storage_container.marketing.name
    storage_account_name = azurerm_storage_account.blobaccount.name
    auth_type            = "ACCESS_KEY"
    token_secret_scope   = databricks_secret_scope.terraform.name
    token_secret_key     = databricks_secret.storage_key.key
  }
}

Migration from other mount resources

Migration from the specific mount resource is straightforward:

Attribute Reference

In addition to all arguments above, the following attributes are exported:

Import

The following resources are often used in the same context: