airflow.contrib.hooks.databricks_hook
¶
Databricks hook.
Module Contents¶
-
airflow.contrib.hooks.databricks_hook.
RESTART_CLUSTER_ENDPOINT
= ['POST', 'api/2.0/clusters/restart'][source]¶
-
airflow.contrib.hooks.databricks_hook.
START_CLUSTER_ENDPOINT
= ['POST', 'api/2.0/clusters/start'][source]¶
-
airflow.contrib.hooks.databricks_hook.
TERMINATE_CLUSTER_ENDPOINT
= ['POST', 'api/2.0/clusters/delete'][source]¶
-
airflow.contrib.hooks.databricks_hook.
SUBMIT_RUN_ENDPOINT
= ['POST', 'api/2.0/jobs/runs/submit'][source]¶
-
airflow.contrib.hooks.databricks_hook.
CANCEL_RUN_ENDPOINT
= ['POST', 'api/2.0/jobs/runs/cancel'][source]¶
-
class
airflow.contrib.hooks.databricks_hook.
RunState
(life_cycle_state, result_state, state_message)[source]¶ Utility class for the run state concept of Databricks runs.
-
class
airflow.contrib.hooks.databricks_hook.
DatabricksHook
(databricks_conn_id='databricks_default', timeout_seconds=180, retry_limit=3, retry_delay=1.0)[source]¶ Bases:
airflow.hooks.base_hook.BaseHook
Interact with Databricks.
- Parameters
databricks_conn_id (str) – The name of the databricks connection to use.
timeout_seconds (int) – The amount of time in seconds the requests library will wait before timing-out.
retry_limit (int) – The number of times to retry the connection in case of service outages.
retry_delay (float) – The number of seconds to wait between retries (it might be a floating point number).
-
static
_parse_host
(host)[source]¶ The purpose of this function is to be robust to improper connections settings provided by users, specifically in the host field.
For example – when users supply
https://xx.cloud.databricks.com
as the host, we must strip out the protocol to get the host.:h = DatabricksHook() assert h._parse_host('https://xx.cloud.databricks.com') == 'xx.cloud.databricks.com'
In the case where users supply the correct
xx.cloud.databricks.com
as the host, this function is a no-op.:assert h._parse_host('xx.cloud.databricks.com') == 'xx.cloud.databricks.com'
-
_do_api_call
(self, endpoint_info, json)[source]¶ Utility function to perform an API call with retries
-
get_run_page_url
(self, run_id:str)[source]¶ Retrieves run_page_url.
- Parameters
run_id – id of the run
- Returns
URL of the run page
-
get_run_state
(self, run_id:str)[source]¶ Retrieves run state of the run.
- Parameters
run_id – id of the run
- Returns
state of the run
-
restart_cluster
(self, json:dict)[source]¶ Restarts the cluster.
- Parameters
json – json dictionary containing cluster specification.
-
airflow.contrib.hooks.databricks_hook.
RUN_LIFE_CYCLE_STATES
= ['PENDING', 'RUNNING', 'TERMINATING', 'TERMINATED', 'SKIPPED', 'INTERNAL_ERROR'][source]¶
-
class
airflow.contrib.hooks.databricks_hook.
_TokenAuth
(token)[source]¶ Bases:
requests.auth.AuthBase
Helper class for requests Auth field. AuthBase requires you to implement the __call__ magic function.