airflow.gcp.hooks.dataflow

This module contains a Google Dataflow Hook.

Module Contents

airflow.gcp.hooks.dataflow.DEFAULT_DATAFLOW_LOCATION = us-central1[source]
class airflow.gcp.hooks.dataflow.DataflowJobStatus[source]

Helper class with Dataflow job statuses.

JOB_STATE_DONE = JOB_STATE_DONE[source]
JOB_STATE_RUNNING = JOB_STATE_RUNNING[source]
JOB_TYPE_STREAMING = JOB_TYPE_STREAMING[source]
JOB_STATE_FAILED = JOB_STATE_FAILED[source]
JOB_STATE_CANCELLED = JOB_STATE_CANCELLED[source]
JOB_STATE_PENDING = JOB_STATE_PENDING[source]
FAILED_END_STATES[source]
SUCCEEDED_END_STATES[source]
END_STATES[source]
class airflow.gcp.hooks.dataflow._DataflowJob(dataflow:Any, project_number:str, name:str, location:str, poll_sleep:int=10, job_id:str=None, num_retries:int=0, multiple_jobs:bool=False)[source]

Bases: airflow.utils.log.logging_mixin.LoggingMixin

is_job_running(self)[source]

Helper method to check if jos is still running in dataflow

Returns

True if job is running.

Return type

bool

_get_dataflow_jobs(self)[source]

Helper method to get list of jobs that start with job name or id

Returns

list of jobs including id’s

Return type

list

_get_jobs(self)[source]

Helper method to get all jobs by name

Returns

jobs

Return type

list

check_dataflow_job_state(self, job)[source]

Helper method to check the state of all jobs in dataflow for this task if job failed raise exception

Returns

True if job is done.

Return type

bool

Raise

Exception

wait_for_done(self)[source]

Helper method to wait for result of submitted job.

Returns

True if job is done.

Return type

bool

Raise

Exception

get(self)[source]

Returns Dataflow job. :return: list of jobs :rtype: list

class airflow.gcp.hooks.dataflow._Dataflow(cmd)[source]

Bases: airflow.utils.log.logging_mixin.LoggingMixin

_line(self, fd)[source]
static _extract_job(line:bytes)[source]

Extracts job_id.

Parameters

line (str) – URL from which job_id has to be extracted

Returns

job_id or None if no match

Return type

Optional[str]

wait_for_done(self)[source]

Waits for Dataflow job to complete.

Returns

Job id

Return type

Optional[str]

class airflow.gcp.hooks.dataflow.DataFlowHook(gcp_conn_id:str='google_cloud_default', delegate_to:str=None, poll_sleep:int=10)[source]

Bases: airflow.contrib.hooks.gcp_api_base_hook.GoogleCloudBaseHook

Hook for Google Dataflow.

All the methods in the hook where project_id is used must be called with keyword arguments rather than positional.

get_conn(self)[source]

Returns a Google Cloud Dataflow service object.

_start_dataflow(self, variables:Dict, name:str, command_prefix:List[str], label_formatter:Callable[[Dict], List[str]], multiple_jobs:bool=False)[source]
static _set_variables(variables:Dict)[source]
start_java_dataflow(self, job_name:str, variables:Dict, jar:str, job_class:str=None, append_job_name:bool=True, multiple_jobs:bool=False)[source]

Starts Dataflow java job.

Parameters
  • job_name (str) – The name of the job.

  • variables (dict) – Variables passed to the job.

  • jar – Name of the jar for the job

  • job_class (str) – Name of the java class for the job.

  • append_job_name (bool) – True if unique suffix has to be appended to job name.

  • multiple_jobs (bool) – True if to check for multiple job in dataflow

start_template_dataflow(self, job_name:str, variables:Dict, parameters:Dict, dataflow_template:str, append_job_name=True)[source]

Starts Dataflow template job.

Parameters
  • job_name (str) – The name of the job.

  • variables (dict) – Variables passed to the job.

  • parameters (dict) – Parameters fot the template

  • dataflow_template (str) – GCS path to the template.

  • append_job_name (bool) – True if unique suffix has to be appended to job name.

start_python_dataflow(self, job_name:str, variables:Dict, dataflow:str, py_options:List[str], append_job_name:bool=True)[source]

Starts Dataflow job.

Parameters
  • job_name (str) – The name of the job.

  • variables (dict) – Variables passed to the job.

  • dataflow (str) – Name of the Dataflow process.

  • py_options (list) – Additional options.

  • append_job_name (bool) – True if unique suffix has to be appended to job name.

static _build_dataflow_job_name(job_name:str, append_job_name:bool=True)[source]
static _build_cmd(variables:Dict, label_formatter:Callable)[source]
_start_template_dataflow(self, name:str, variables:Dict[str, Any], parameters:Dict, dataflow_template:str)[source]
is_job_dataflow_running(self, name:str, variables:Dict)[source]

Helper method to check if jos is still running in dataflow

Returns

True if job is running.

Return type

bool