airflow.contrib.hooks.databricks_hook

Databricks hook.

Module Contents

airflow.contrib.hooks.databricks_hook.RESTART_CLUSTER_ENDPOINT = ['POST', 'api/2.0/clusters/restart'][source]
airflow.contrib.hooks.databricks_hook.START_CLUSTER_ENDPOINT = ['POST', 'api/2.0/clusters/start'][source]
airflow.contrib.hooks.databricks_hook.TERMINATE_CLUSTER_ENDPOINT = ['POST', 'api/2.0/clusters/delete'][source]
airflow.contrib.hooks.databricks_hook.RUN_NOW_ENDPOINT = ['POST', 'api/2.0/jobs/run-now'][source]
airflow.contrib.hooks.databricks_hook.SUBMIT_RUN_ENDPOINT = ['POST', 'api/2.0/jobs/runs/submit'][source]
airflow.contrib.hooks.databricks_hook.GET_RUN_ENDPOINT = ['GET', 'api/2.0/jobs/runs/get'][source]
airflow.contrib.hooks.databricks_hook.CANCEL_RUN_ENDPOINT = ['POST', 'api/2.0/jobs/runs/cancel'][source]
airflow.contrib.hooks.databricks_hook.USER_AGENT_HEADER[source]
class airflow.contrib.hooks.databricks_hook.RunState(life_cycle_state, result_state, state_message)[source]

Utility class for the run state concept of Databricks runs.

is_terminal[source]

True if the current state is a terminal state.

is_successful[source]

True if the result state is SUCCESS

__eq__(self, other)[source]
__repr__(self)[source]
class airflow.contrib.hooks.databricks_hook.DatabricksHook(databricks_conn_id='databricks_default', timeout_seconds=180, retry_limit=3, retry_delay=1.0)[source]

Bases: airflow.hooks.base_hook.BaseHook

Interact with Databricks.

Parameters
  • databricks_conn_id (str) – The name of the databricks connection to use.

  • timeout_seconds (int) – The amount of time in seconds the requests library will wait before timing-out.

  • retry_limit (int) – The number of times to retry the connection in case of service outages.

  • retry_delay (float) – The number of seconds to wait between retries (it might be a floating point number).

static _parse_host(host)[source]

The purpose of this function is to be robust to improper connections settings provided by users, specifically in the host field.

For example – when users supply https://xx.cloud.databricks.com as the host, we must strip out the protocol to get the host.:

h = DatabricksHook()
assert h._parse_host('https://xx.cloud.databricks.com') ==                 'xx.cloud.databricks.com'

In the case where users supply the correct xx.cloud.databricks.com as the host, this function is a no-op.:

assert h._parse_host('xx.cloud.databricks.com') == 'xx.cloud.databricks.com'
_do_api_call(self, endpoint_info, json)[source]

Utility function to perform an API call with retries

Parameters
  • endpoint_info (tuple[string, string]) – Tuple of method and endpoint

  • json (dict) – Parameters for this API call.

Returns

If the api call returns a OK status code, this function returns the response in JSON. Otherwise, we throw an AirflowException.

Return type

dict

_log_request_error(self, attempt_num, error)[source]
run_now(self, json)[source]

Utility function to call the api/2.0/jobs/run-now endpoint.

Parameters

json (dict) – The data used in the body of the request to the run-now endpoint.

Returns

the run_id as a string

Return type

str

submit_run(self, json)[source]

Utility function to call the api/2.0/jobs/runs/submit endpoint.

Parameters

json (dict) – The data used in the body of the request to the submit endpoint.

Returns

the run_id as a string

Return type

str

get_run_page_url(self, run_id:str)[source]

Retrieves run_page_url.

Parameters

run_id – id of the run

Returns

URL of the run page

get_run_state(self, run_id:str)[source]

Retrieves run state of the run.

Parameters

run_id – id of the run

Returns

state of the run

cancel_run(self, run_id:str)[source]

Cancels the run.

Parameters

run_id – id of the run

restart_cluster(self, json:dict)[source]

Restarts the cluster.

Parameters

json – json dictionary containing cluster specification.

start_cluster(self, json:dict)[source]

Starts the cluster.

Parameters

json – json dictionary containing cluster specification.

terminate_cluster(self, json:dict)[source]

Terminates the cluster.

Parameters

json – json dictionary containing cluster specification.

airflow.contrib.hooks.databricks_hook._retryable_error(exception)[source]
airflow.contrib.hooks.databricks_hook.RUN_LIFE_CYCLE_STATES = ['PENDING', 'RUNNING', 'TERMINATING', 'TERMINATED', 'SKIPPED', 'INTERNAL_ERROR'][source]
class airflow.contrib.hooks.databricks_hook._TokenAuth(token)[source]

Bases: requests.auth.AuthBase

Helper class for requests Auth field. AuthBase requires you to implement the __call__ magic function.

__call__(self, r)[source]