You can set an option by:
Passing it on the command line with the switch version (like --some-option)
Passing it as a keyword argument to the runner constructor, if you are creating the runner programmatically
Putting it in one of the included config files under a runner name, like this:
runners:
local:
python_bin: python2.6 # only used in local runner
emr:
python_bin: python2.5 # only used in Elastic MapReduce runner
See Config file format and location for information on where to put config files.
For some options, it doesn’t make sense to be able to set them in the config file. These can only be specified when calling the constructor of MRJobRunner, as command line options, or sometimes by overriding some attribute or method of your MRJob subclass.
Config | Command line | Default | Type |
---|---|---|---|
conf_paths | -c, –conf-path, –no-conf | see find_mrjob_conf() | path list |
no_output | –no-output | False | boolean |
output_dir | –output-dir | (automatic) | string |
partitioner | –partitioner | None | string |
Option | Method | Default |
---|---|---|
extra_args | add_passthrough_option() | [] |
file_upload_args | add_file_option() | [] |
hadoop_input_format | hadoop_input_format() | None |
hadoop_output_format | hadoop_output_format() | None |
These options can be passed to any runner without an error, though some runners may ignore some options. See the text after the table for specifics.
LocalMRJobRunner takes no additional options, but:
In addition, it ignores hadoop_input_format, hadoop_output_format, hadoop_streaming_jar, and jobconf
InlineMRJobRunner works like LocalMRJobRunner, only it also ignores bootstrap_mrjob, cmdenv, python_bin, setup_cmds, setup_scripts, steps_python_bin, upload_archives, and upload_files.
Config | Command line | Default | Type |
---|---|---|---|
check_input_paths | –check-input-paths, –no-check-input-paths | True | boolean |
hadoop_bin | –hadoop-bin | hadoop_home plus bin/hadoop | command |
hadoop_home | –hadoop-home | HADOOP_HOME | path |
hdfs_scratch_dir | –hdfs-scratch-dir | tmp/ | path |