airflow.contrib.operators.sqoop_operator
¶
This module contains a sqoop 1 operator
Module Contents¶
-
class
airflow.contrib.operators.sqoop_operator.
SqoopOperator
(conn_id='sqoop_default', cmd_type='import', table=None, query=None, target_dir=None, append=None, file_type='text', columns=None, num_mappers=None, split_by=None, where=None, export_dir=None, input_null_string=None, input_null_non_string=None, staging_table=None, clear_staging_table=False, enclosed_by=None, escaped_by=None, input_fields_terminated_by=None, input_lines_terminated_by=None, input_optionally_enclosed_by=None, batch=False, direct=False, driver=None, verbose=False, relaxed_isolation=False, properties=None, hcatalog_database=None, hcatalog_table=None, create_hcatalog_table=False, extra_import_options=None, extra_export_options=None, *args, **kwargs)[source]¶ Bases:
airflow.models.BaseOperator
Execute a Sqoop job. Documentation for Apache Sqoop can be found here: https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html
- Parameters
conn_id – str
cmd_type – str specify command to execute “export” or “import”
table – Table to read
query – Import result of arbitrary SQL query. Instead of using the table, columns and where arguments, you can specify a SQL statement with the query argument. Must also specify a destination directory with target_dir.
target_dir – HDFS destination directory where the data from the rdbms will be written
append – Append data to an existing dataset in HDFS
file_type – “avro”, “sequence”, “text” Imports data to into the specified format. Defaults to text.
columns – <col,col,col> Columns to import from table
num_mappers – Use n mapper tasks to import/export in parallel
split_by – Column of the table used to split work units
where – WHERE clause to use during import
export_dir – HDFS Hive database directory to export to the rdbms
input_null_string – The string to be interpreted as null for string columns
input_null_non_string – The string to be interpreted as null for non-string columns
staging_table – The table in which data will be staged before being inserted into the destination table
clear_staging_table – Indicate that any data present in the staging table can be deleted
enclosed_by – Sets a required field enclosing character
escaped_by – Sets the escape character
input_fields_terminated_by – Sets the input field separator
input_lines_terminated_by – Sets the input end-of-line character
input_optionally_enclosed_by – Sets a field enclosing character
batch – Use batch mode for underlying statement execution
direct – Use direct export fast path
driver – Manually specify JDBC driver class to use
verbose – Switch to more verbose logging for debug purposes
relaxed_isolation – use read uncommitted isolation level
hcatalog_database – Specifies the database name for the HCatalog table
hcatalog_table – The argument value for this option is the HCatalog table
create_hcatalog_table – Have sqoop create the hcatalog table passed in or not
properties – additional JVM properties passed to sqoop
extra_import_options – Extra import options to pass as dict. If a key doesn’t have a value, just pass an empty string to it. Don’t include prefix of – for sqoop options.
extra_export_options – Extra export options to pass as dict. If a key doesn’t have a value, just pass an empty string to it. Don’t include prefix of – for sqoop options.
-
template_fields
= ['conn_id', 'cmd_type', 'table', 'query', 'target_dir', 'file_type', 'columns', 'split_by', 'where', 'export_dir', 'input_null_string', 'input_null_non_string', 'staging_table', 'enclosed_by', 'escaped_by', 'input_fields_terminated_by', 'input_lines_terminated_by', 'input_optionally_enclosed_by', 'properties', 'extra_import_options', 'driver', 'extra_export_options', 'hcatalog_database', 'hcatalog_table'][source]¶