airflow.gcp.hooks.dataflow
¶
This module contains a Google Dataflow Hook.
Module Contents¶
-
class
airflow.gcp.hooks.dataflow.
DataflowJobStatus
[source]¶ Helper class with Dataflow job statuses.
-
class
airflow.gcp.hooks.dataflow.
_DataflowJob
(dataflow:Any, project_number:str, name:str, location:str, poll_sleep:int=10, job_id:str=None, num_retries:int=0, multiple_jobs:bool=False)[source]¶ Bases:
airflow.utils.log.logging_mixin.LoggingMixin
-
is_job_running
(self)[source]¶ Helper method to check if jos is still running in dataflow
- Returns
True if job is running.
- Return type
-
_get_dataflow_jobs
(self)[source]¶ Helper method to get list of jobs that start with job name or id
- Returns
list of jobs including id’s
- Return type
-
check_dataflow_job_state
(self, job)[source]¶ Helper method to check the state of all jobs in dataflow for this task if job failed raise exception
- Returns
True if job is done.
- Return type
- Raise
Exception
-
-
class
airflow.gcp.hooks.dataflow.
_Dataflow
(cmd)[source]¶ Bases:
airflow.utils.log.logging_mixin.LoggingMixin
-
class
airflow.gcp.hooks.dataflow.
DataFlowHook
(gcp_conn_id:str='google_cloud_default', delegate_to:str=None, poll_sleep:int=10)[source]¶ Bases:
airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook
Hook for Google Dataflow.
All the methods in the hook where project_id is used must be called with keyword arguments rather than positional.
-
_start_dataflow
(self, variables:Dict, name:str, command_prefix:List[str], label_formatter:Callable[[Dict], List[str]], multiple_jobs:bool=False)[source]¶
-
start_java_dataflow
(self, job_name:str, variables:Dict, jar:str, job_class:str=None, append_job_name:bool=True, multiple_jobs:bool=False)[source]¶ Starts Dataflow java job.
- Parameters
job_name (str) – The name of the job.
variables (dict) – Variables passed to the job.
jar – Name of the jar for the job
job_class (str) – Name of the java class for the job.
append_job_name (bool) – True if unique suffix has to be appended to job name.
multiple_jobs (bool) – True if to check for multiple job in dataflow
-
start_template_dataflow
(self, job_name:str, variables:Dict, parameters:Dict, dataflow_template:str, append_job_name=True)[source]¶ Starts Dataflow template job.
-
start_python_dataflow
(self, job_name:str, variables:Dict, dataflow:str, py_options:List[str], append_job_name:bool=True)[source]¶ Starts Dataflow job.
-