airflow.contrib.hooks.gcp_dataproc_hook
¶
This module contains a Google Cloud Dataproc hook.
Module Contents¶
-
class
airflow.contrib.hooks.gcp_dataproc_hook.
DataprocJobStatus
[source]¶ Helper class with Dataproc jobs statuses.
-
class
airflow.contrib.hooks.gcp_dataproc_hook.
_DataProcJob
(dataproc_api:Any, project_id:str, job:Dict, region:str='global', job_error_states:Iterable[str]=None, num_retries:int=None)[source]¶ Bases:
airflow.utils.log.logging_mixin.LoggingMixin
-
wait_for_done
(self)[source]¶ Awaits the Dataproc job to complete.
- Returns
True if job was done
- Return type
-
-
class
airflow.contrib.hooks.gcp_dataproc_hook.
_DataProcJobBuilder
(project_id:str, task_id:str, cluster_name:str, job_type:str, properties:Dict[str, str])[source]¶ -
add_labels
(self, labels)[source]¶ Set labels for Dataproc job.
- Parameters
labels (dict) – Labels for the job query.
-
add_variables
(self, variables:List[str])[source]¶ Set variables for Dataproc job.
- Parameters
variables (List[str]) – Variables for the job query.
-
add_args
(self, args:List[str])[source]¶ Set args for Dataproc job.
- Parameters
args (List[str]) – Args for the job query.
-
add_query
(self, query:List[str])[source]¶ Set query uris for Dataproc job.
- Parameters
query (List[str]) – URIs for the job queries.
-
add_query_uri
(self, query_uri:str)[source]¶ Set query uri for Dataproc job.
- Parameters
query_uri (str) – URI for the job query.
-
add_jar_file_uris
(self, jars:List[str])[source]¶ Set jars uris for Dataproc job.
- Parameters
jars (List[str]) – List of jars URIs
-
add_archive_uris
(self, archives:List[str])[source]¶ Set archives uris for Dataproc job.
- Parameters
archives (List[str]) – List of archives URIs
-
add_file_uris
(self, files:List[str])[source]¶ Set file uris for Dataproc job.
- Parameters
files (List[str]) – List of files URIs
-
add_python_file_uris
(self, pyfiles:List[str])[source]¶ Set python file uris for Dataproc job.
- Parameters
pyfiles (List[str]) – List of python files URIs
-
-
class
airflow.contrib.hooks.gcp_dataproc_hook.
_DataProcOperation
(dataproc_api:Any, operation:Dict, num_retries:int)[source]¶ Bases:
airflow.utils.log.logging_mixin.LoggingMixin
Continuously polls Dataproc Operation until it completes.
-
class
airflow.contrib.hooks.gcp_dataproc_hook.
DataProcHook
(gcp_conn_id:str='google_cloud_default', delegate_to:str=None, api_version:str='v1beta2')[source]¶ Bases:
airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook
Hook for Google Cloud Dataproc APIs.
All the methods in the hook where project_id is used must be called with keyword arguments rather than positional.
- Parameters
-
get_cluster
(self, project_id:str, region:str, cluster_name:str)[source]¶ Returns Google Cloud Dataproc cluster.
-
submit
(self, project_id:str, job:Dict, region:str='global', job_error_states:Iterable[str]=None)[source]¶ Submits Google Cloud Dataproc job.
-
create_job_template
(self, task_id:str, cluster_name:str, job_type:str, properties:Dict[str, str])[source]¶ Creates Google Cloud Dataproc job template.