mrjob.hadoop - run on your Hadoop cluster

class mrjob.hadoop.HadoopJobRunner(**kwargs)

Runs an MRJob on your Hadoop cluster. Invoked when you run your job with -r hadoop.

Input and support files can be either local or on HDFS; use hdfs://... URLs to refer to files on HDFS.

HadoopJobRunner.__init__(**kwargs)

HadoopJobRunner takes the same arguments as MRJobRunner, plus some additional options which can be defaulted in mrjob.conf.

Utilities

mrjob.hadoop.hadoop_log_dir(hadoop_home=None)

Return the path where Hadoop stores logs.

Parameters:hadoop_home – putative value of HADOOP_HOME, or None to default to the actual value if used. This is only used if HADOOP_LOG_DIR is not defined.
mrjob.hadoop.find_hadoop_streaming_jar(path)

Return the path of the hadoop streaming jar inside the given directory tree, or None if we can’t find it.

mrjob.hadoop.fully_qualify_hdfs_path(path)

If path isn’t an hdfs:// URL, turn it into one.

Table Of Contents

Need help?

Join the mailing list by visiting the Google group page or sending an email to mrjob+subscribe@googlegroups.com.