databricks_job Resource

The databricks_job resource allows you to manage Databricks Jobs to run non-interactive code in a databricks_cluster.

Example Usage

It is possible to create a Databricks job using task blocks. A single task is defined with the task block containing one of the *_task blocks, task_key, and additional arguments described below.

resource "databricks_job" "this" {
  name        = "Job with multiple tasks"
  description = "This job executes multiple tasks on a shared job cluster, which will be provisioned as part of execution, and terminated once all tasks are finished."

  job_cluster {
    job_cluster_key = "j"
    new_cluster {
      num_workers   = 2
      spark_version = data.databricks_spark_version.latest.id
      node_type_id  = data.databricks_node_type.smallest.id
    }
  }

  task {
    task_key = "a"

    new_cluster {
      num_workers   = 1
      spark_version = data.databricks_spark_version.latest.id
      node_type_id  = data.databricks_node_type.smallest.id
    }

    notebook_task {
      notebook_path = databricks_notebook.this.path
    }
  }

  task {
    task_key = "b"
    //this task will only run after task a
    depends_on {
      task_key = "a"
    }

    existing_cluster_id = databricks_cluster.shared.id

    spark_jar_task {
      main_class_name = "com.acme.data.Main"
    }
  }

  task {
    task_key = "c"

    job_cluster_key = "j"

    notebook_task {
      notebook_path = databricks_notebook.this.path
    }
  }
  //this task starts a Delta Live Tables pipline update
  task {
    task_key = "d"

    pipeline_task {
      pipeline_id = databricks_pipeline.this.id
    }
  }
}

Argument Reference

The resource supports the following arguments:

task Configuration Block

This block describes individual tasks:

condition_task Configuration Block

The condition_task specifies a condition with an outcome that can be used to control the execution of dependent tasks.

This task does not require a cluster to execute and does not support retries or notifications.

dbt_task Configuration Block

You also need to include a git_source block to configure the repository that contains the dbt project.

for_each_task Configuration Block

notebook_task Configuration Block

pipeline_task Configuration Block

python_wheel_task Configuration Block

run_job_task Configuration Block

spark_jar_task Configuration Block

spark_python_task Configuration Block

spark_submit_task Configuration Block

You can invoke Spark submit tasks only on new clusters. In the new_cluster specification, libraries and spark_conf are not supported. Instead, use --jars and --py-files to add Java and Python libraries and --conf to set the Spark configuration. By default, the Spark submit job uses all available memory (excluding reserved memory for Databricks services). You can set --driver-memory, and --executor-memory to a smaller value to leave some room for off-heap usage. Please use spark_jar_task, spark_python_task or notebook_task wherever possible.

sql_task Configuration Block

One of the query, dashboard or alert needs to be provided.

Example

resource "databricks_job" "sql_aggregation_job" {
  name = "Example SQL Job"
  task {
    task_key = "run_agg_query"
    sql_task {
      warehouse_id = databricks_sql_endpoint.sql_job_warehouse.id
      query {
        query_id = databricks_sql_query.agg_query.id
      }
    }
  }
  task {
    task_key = "run_dashboard"
    sql_task {
      warehouse_id = databricks_sql_endpoint.sql_job_warehouse.id
      dashboard {
        dashboard_id = databricks_sql_dashboard.dash.id
        subscriptions {
          user_name = "user@domain.com"
        }
      }
    }
  }
  task {
    task_key = "run_alert"
    sql_task {
      warehouse_id = databricks_sql_endpoint.sql_job_warehouse.id
      alert {
        alert_id = databricks_sql_alert.alert.id
        subscriptions {
          user_name = "user@domain.com"
        }
      }
    }
  }
}

library Configuration Block

This block descripes an optional library to be installed on the cluster that will execute the job. For multiple libraries, use multiple blocks. If the job specifies more than one task, these blocks needs to be placed within the task block. Please consult libraries section of the databricks_cluster resource for more information.

resource "databricks_job" "this" {
  library {
    pypi {
      package = "databricks-mosaic==0.3.14"
    }
  }
}

depends_on Configuration Block

This block describes upstream dependencies of a given task. For multiple upstream dependencies, use multiple blocks.

tags Configuration Map

tags - (Optional) (Map) An optional map of the tags associated with the job. Specified tags will be used as cluster tags for job clusters.

Example

resource "databricks_job" "this" {
  # ...
  tags = {
    environment = "dev"
    owner       = "dream-team"
  }
}

run_as Configuration Block

The run_as block allows specifying the user or the service principal that the job runs as. If not specified, the job runs as the user or service principal that created the job. Only one of user_name or service_principal_name can be specified.

Example:

resource "databricks_job" "this" {
  # ...
  run_as {
    service_principal_name = "8d23ae77-912e-4a19-81e4-b9c3f5cc9349"
  }
}

job_cluster Configuration Block

Shared job cluster specification. Allows multiple tasks in the same job run to reuse the cluster.

schedule Configuration Block

continuous Configuration Block

queue Configuration Block

This block describes the queue settings of the job:

trigger Configuration Block

git_source Configuration Block

This block is used to specify Git repository information & branch/tag/commit that will be used to pull source code from to execute a job. Supported options are:

parameter Configuration Block

This block defines a job-level parameter for the job. You can define several job-level parameters for the job. Supported options are:

You can use this block only together with task blocks, not with the legacy tasks specification!

email_notifications Configuration Block

This block can be configured on both job and task levels for corresponding effect.

The following parameter is only available for the job level configuration.

webhook_notifications Configuration Block

Each entry in webhook_notification block takes a list webhook blocks. The field is documented below.

Note that the id is not to be confused with the name of the alert destination. The id can be retrieved through the API or the URL of Databricks UI https://<workspace host>/sql/destinations/<notification id>?o=<workspace id>

Example

webhook_notifications {
  on_failure {
    id = "fb99f3dc-a0a0-11ed-a8fc-0242ac120002"
  }
}

webhook Configuration Block

notification_settings Configuration Block

This block controls notification settings for both email & webhook notifications. It can be configured on both job and task level for corresponding effect.

The following parameter is only available on task level.

health Configuration Block

This block describes health conditions for a given job or an individual task. It consists of the following attributes:

Attribute Reference

In addition to all arguments above, the following attributes are exported:

Access Control

By default, all users can create and modify jobs unless an administrator enables jobs access control. With jobs access control, individual permissions determine a user’s abilities.

Single-task syntax (deprecated)

This syntax uses Jobs API 2.0 to create a job with a single task. Only a subset of arguments above is supported (name, libraries, email_notifications, webhook_notifications, timeout_seconds, max_retries, min_retry_interval_millis, retry_on_timeout, schedule, max_concurrent_runs), and only a single block of notebook_task, spark_jar_task, spark_python_task, spark_submit_task and pipeline_task can be specified.

The job cluster is specified using either of the below argument:

data "databricks_current_user" "me" {}
data "databricks_spark_version" "latest" {}
data "databricks_node_type" "smallest" {
  local_disk = true
}

resource "databricks_notebook" "this" {
  path     = "${data.databricks_current_user.me.home}/Terraform"
  language = "PYTHON"
  content_base64 = base64encode(<<-EOT
    # created from ${abspath(path.module)}
    display(spark.range(10))
    EOT
  )
}

resource "databricks_job" "this" {
  name = "Terraform Demo (${data.databricks_current_user.me.alphanumeric})"

  new_cluster {
    num_workers   = 1
    spark_version = data.databricks_spark_version.latest.id
    node_type_id  = data.databricks_node_type.smallest.id
  }

  notebook_task {
    notebook_path = databricks_notebook.this.path
  }
}

output "notebook_url" {
  value = databricks_notebook.this.url
}

output "job_url" {
  value = databricks_job.this.url
}

Timeouts

The timeouts block allows you to specify create and update timeouts if you have an always_running job. Please launch TF_LOG=DEBUG terraform apply whenever you observe timeout issues.

timeouts {
  create = "20m"
  update = "20m"
}

Import

The resource job can be imported using the id of the job

terraform import databricks_job.this <job-id>

The following resources are often used in the same context: