Generates *.tf
files for Databricks resources together with import.sh
that is used to import objects into the Terraform state. Available as part of provider binary. The only way to authenticate is through environment variables. It's best used when you need to export Terraform configuration for an existing Databricks workspace quickly. After generating the configuration, we strongly recommend manually reviewing all created files.
After downloading the latest released binary, unpack it and place it in the same folder. You may have already downloaded this binary - check the .terraform
folder of any state directory where you've used the databricks
provider. It could also be in your plugin cache ~/.terraform.d/plugins/registry.terraform.io/databricks/databricks/*/*/terraform-provider-databricks
. Here's the tool in action:
Exporter can also be used in a non-interactive mode:
export DATABRICKS_HOST=...
export DATABRICKS_TOKEN=...
./terraform-provider-databricks exporter -skip-interactive \
-services=groups,secrets,access,compute,users,jobs,storage \
-listing=jobs,compute \
-last-active-days=90 \
-debug
All arguments are optional, and they tune what code is being generated.
-directory
- Path to a directory, where *.tf
and import.sh
files would be written. By default, it's set to the current working directory.-module
- Name of module in Terraform state that would affect reference resolution and prefixes for generated commands in import.sh
.-last-active-days
- Items older than -last-active-days
won't be imported. By default, the value is set to 3650 (10 years). Has an effect on listing databricks_cluster and databricks_job resources.-services
- Comma-separated list of services to import. By default, all services are imported.-listing
- Comma-separated list of services to be listed and further passed on for importing. -services
parameter controls which transitive dependencies will be processed. We recommend limiting with -listing
more often than with -services
.-match
- Match resource names during listing operation. This filter applies to all resources that are getting listed, so if you want to import all dependencies of just one cluster, specify -match=autoscaling -listing=compute
. By default, it is empty, which matches everything.-mounts
- List DBFS mount points, an extremely slow operation that would not trigger unless explicitly specified.-generateProviderDeclaration
- the flag that toggles the generation of databricks.tf
file with the declaration of the Databricks Terraform provider that is necessary for Terraform versions since Terraform 0.13 (disabled by default).-prefix
- optional prefix that will be added to the name of all exported resources - that's useful for exporting resources from multiple workspaces for merging into a single one.-skip-interactive
- optionally run in a non-interactive mode.-includeUserDomains
- optionally include domain name into generated resource name for databricks_user
resource.-importAllUsers
- optionally include all users and service principals even if they are only part of the users
group.-exportDeletedUsersAssets
- optionally include assets of deleted users and service principals.-incremental
- experimental option for incremental export of modified resources and merging with existing resources. Please note that only a limited set of resources (notebooks, SQL queries/dashboards/alerts, ...) provides information about the last modified date - all other resources will be re-exported again! Also, it's impossible to detect the deletion of many resource types (i.e. clusters, jobs, ...), so you must do periodic full export if resources are deleted! For Workspace objects (notebooks, workspace files and directories) exporter tries to detect deleted objects and remove them from generated code (requires presence of ws_objects.json
file that is written on each export that pulls all workspace objects). For workspace objects renames are handled as deletion of existing/creation of new resource! Requires -updated-since
option if no exporter-run-stats.json
file exists in the output directory.-updated-since
- timestamp (in ISO8601 format supported by Go language) for exporting of resources modified since a given timestamp. I.e., 2023-07-24T00:00:00Z
. If not specified, the exporter will try to load the last run timestamp from the exporter-run-stats.json
file generated during the export and use it.-notebooksFormat
- optional format for exporting of notebooks. Supported values are SOURCE
(default), DBC
, JUPYTER
. This option could be used to export notebooks with embedded dashboards.-noformat
- optionally turn off the execution of terraform fmt
on the exported files (enabled by default).-debug
- turn on debug output.-trace
- turn on trace output (includes debug level as well).-native-import
- turns on generation of native import blocks (requires Terraform 1.5+). This option is recommended for cases when you want to start to manage existing workspace.-export-secrets
- enables export of the secret values - they will be written into the terraform.tfvars
file. Be very careful with this file!Services are just logical groups of resources used for filtering and organization in files written in -directory
. All resources are globally sorted by their resource name, which allows you to use generated files for compliance purposes. Nevertheless, managing the entire Databricks workspace with Terraform is the preferred way. Except for notebooks and possibly libraries, which may have their own CI/CD processes.
access
- databricks_permissions, databricks_instance_profile and databricks_ip_access_list.compute
- listing databricks_cluster.directories
- listing databricks_directory.dlt
- listing databricks_pipeline.groups
- listing databricks_group with membership and data access.jobs
- listing databricks_job. Usually, there are more automated workflows than interactive clusters, so they get their own file in this tool's output. Please note that workflows deployed and maintained via Databricks Asset Bundles aren't exported!mlflow-webhooks
- listing databricks_mlflow_webhook.model-serving
- listing databricks_model_serving.mounts
- listing works only in combination with -mounts
command-line option.notebooks
- listing databricks_notebook and databricks_workspace_file.policies
- listing databricks_cluster_policy.pools
- listing instance pools.repos
- listing databricks_reposecrets
- listing databricks_secret_scope along with keys and ACLs.sql-alerts
- listing databricks_sql_alert.sql-dashboards
- listing databricks_sql_dashboard along with associated databricks_sql_widget and databricks_sql_visualization.sql-endpoints
- listing databricks_sql_endpoint along with databricks_sql_global_config.sql-queries
- listing databricks_sql_query.storage
- only databricks_dbfs_file and databricks_file referenced in other resources (libraries, init scripts, ...) will be downloaded locally and properly arranged into terraform state.uc-artifact-allowlist
- listing exports databricks_artifact_allowlist resources for Unity Catalog Allow Lists attached to the current metastore.uc-catalogs
- listing databricks_catalog and databricks_catalog_workspace_bindinguc-connections
- listing databricks_connection. Please note that because API doesn't return sensitive fields, such as, passwords, tokens, ..., the generated options
block could be incomplete!uc-external-locations
- listing exports databricks_external_location resource.uc-grants
- databricks_grants. Please note that during export the list of grants is expanded to include the identity that does the export! This is done to allow to create objects in case when catalogs/schemas have different owners than current identity..uc-metastores
- listing databricks_metastore and databricks_metastore_assignment (only on account-level). Please note that when using workspace-level configuration, only metastores from the workspace's region are listed!uc-models
- databricks_registered_modeluc-schemas
- databricks_schemauc-shares
- listing databricks_share and databricks_recipientuc-storage-credentials
- listing exports databricks_storage_credential resources on workspace or account level.uc-system-schemas
- listing exports databricks_system_schema resources for the UC metastore of the current workspace.uc-tables
- databricks_sql_table resource.uc-volumes
- databricks_volumeusers
- databricks_user and databricks_service_principal are written to their own file, simply because of their amount. If you use SCIM provisioning, migrating workspaces is the only use case for importing users
service.workspace
- listing databricks_workspace_conf and databricks_global_init_scriptFor security reasons, databricks_secret cannot contain actual plaintext secrets. By default importer will create a variable in vars.tf
, with the same name as the secret. You are supposed to fill in the value of the secret after that. You can use -export-secrets
command-line option to generate the terraform.tfvars
file with secret values.
To speed up export, Terraform Exporter performs many operations, such as listing & actual data exporting, in parallel using Goroutines. Built-in defaults are controlling the parallelism, but it's also possible to tune some parameters using environment variables specific to the exporter:
EXPORTER_WS_LIST_PARALLELISM
(default: 5
) controls how many Goroutines are used to perform parallel listing of Databricks Workspace objects (notebooks, directories, workspace files, ...).EXPORTER_DIRECTORIES_CHANNEL_SIZE
(default: 100000
) controls the channel's capacity when listing workspace objects. Please ensure that this value is big enough (greater than the number of directories in the workspace; default value should be ok for most cases); otherwise, there is a chance of deadlock.EXPORTER_DEDICATED_RESOUSE_CHANNELS
- by default, only specific resources (databricks_user
, databricks_service_principal
, databricks_group
) have dedicated channels - the rest are handled by the shared channel. This is done to prevent throttling by specific APIs. You can override this by providing a comma-separated list of resources as this environment variable.EXPORTER_PARALLELISM_NNN
- number of Goroutines used to process resources of a specific type (replace NNN
with the exact resource name, for example, EXPORTER_PARALLELISM_databricks_notebook=10
sets the number of Goroutines for databricks_notebook
resource to 10
). There is a shared channel (with name default
) for handling of resources for which there are no dedicated channels - use EXPORTER_PARALLELISM_default
to increase it's size (default size is 15
). Defaults for some resources are defined by the goroutinesNumber
map in exporter/context.go
or equal to 2
if there is no value. Don't increase default values too much to avoid REST API throttling!EXPORTER_DEFAULT_HANDLER_CHANNEL_SIZE
- the size of the shared channel (default: 200000
) - you may need to increase it if you have a huge workspace.Exporter aims to generate HCL code for most of the resources within the Databricks workspace:
Notes:
databricks_library
resources. This is done to decrease the number of generated resources.