airflow.contrib.operators.sqoop_operator

This module contains a sqoop 1 operator

Module Contents

class airflow.contrib.operators.sqoop_operator.SqoopOperator(conn_id='sqoop_default', cmd_type='import', table=None, query=None, target_dir=None, append=None, file_type='text', columns=None, num_mappers=None, split_by=None, where=None, export_dir=None, input_null_string=None, input_null_non_string=None, staging_table=None, clear_staging_table=False, enclosed_by=None, escaped_by=None, input_fields_terminated_by=None, input_lines_terminated_by=None, input_optionally_enclosed_by=None, batch=False, direct=False, driver=None, verbose=False, relaxed_isolation=False, properties=None, hcatalog_database=None, hcatalog_table=None, create_hcatalog_table=False, extra_import_options=None, extra_export_options=None, *args, **kwargs)[source]

Bases: airflow.models.BaseOperator

Execute a Sqoop job. Documentation for Apache Sqoop can be found here: https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html

Parameters
  • conn_id – str

  • cmd_type – str specify command to execute “export” or “import”

  • table – Table to read

  • query – Import result of arbitrary SQL query. Instead of using the table, columns and where arguments, you can specify a SQL statement with the query argument. Must also specify a destination directory with target_dir.

  • target_dir – HDFS destination directory where the data from the rdbms will be written

  • append – Append data to an existing dataset in HDFS

  • file_type – “avro”, “sequence”, “text” Imports data to into the specified format. Defaults to text.

  • columns – <col,col,col> Columns to import from table

  • num_mappers – Use n mapper tasks to import/export in parallel

  • split_by – Column of the table used to split work units

  • where – WHERE clause to use during import

  • export_dir – HDFS Hive database directory to export to the rdbms

  • input_null_string – The string to be interpreted as null for string columns

  • input_null_non_string – The string to be interpreted as null for non-string columns

  • staging_table – The table in which data will be staged before being inserted into the destination table

  • clear_staging_table – Indicate that any data present in the staging table can be deleted

  • enclosed_by – Sets a required field enclosing character

  • escaped_by – Sets the escape character

  • input_fields_terminated_by – Sets the input field separator

  • input_lines_terminated_by – Sets the input end-of-line character

  • input_optionally_enclosed_by – Sets a field enclosing character

  • batch – Use batch mode for underlying statement execution

  • direct – Use direct export fast path

  • driver – Manually specify JDBC driver class to use

  • verbose – Switch to more verbose logging for debug purposes

  • relaxed_isolation – use read uncommitted isolation level

  • hcatalog_database – Specifies the database name for the HCatalog table

  • hcatalog_table – The argument value for this option is the HCatalog table

  • create_hcatalog_table – Have sqoop create the hcatalog table passed in or not

  • properties – additional JVM properties passed to sqoop

  • extra_import_options – Extra import options to pass as dict. If a key doesn’t have a value, just pass an empty string to it. Don’t include prefix of – for sqoop options.

  • extra_export_options – Extra export options to pass as dict. If a key doesn’t have a value, just pass an empty string to it. Don’t include prefix of – for sqoop options.

template_fields = ['conn_id', 'cmd_type', 'table', 'query', 'target_dir', 'file_type', 'columns', 'split_by', 'where', 'export_dir', 'input_null_string', 'input_null_non_string', 'staging_table', 'enclosed_by', 'escaped_by', 'input_fields_terminated_by', 'input_lines_terminated_by', 'input_optionally_enclosed_by', 'properties', 'extra_import_options', 'driver', 'extra_export_options', 'hcatalog_database', 'hcatalog_table'][source]
ui_color = #7D8CA4[source]
execute(self, context)[source]

Execute sqoop job

on_kill(self)[source]