mrjob.step - represent Job Steps

class mrjob.step.JarStep(*args, **kwargs)

Represents a running a custom Jar as a step.

INPUT = '<input>'

If this is passed as one of the step’s arguments, it’ll be replaced with the step’s input paths (if there are multiple paths, they’ll be joined with commas)

OUTPUT = '<output>'

If this is passed as one of the step’s arguments, it’ll be replaced with the step’s output path

description(step_num)

Returns a dictionary representation of this step:

{
    'type': 'jar',
    'jar': path of the jar,
    'main_class': string, name of the main class,
    'args': list of strings, args to the main class,
}

See Format of –steps for examples.

mrjob.step.MRJobStep

alias of MRStep

class mrjob.step.MRStep(**kwargs)

Represents steps handled by the script containing your job.

description(step_num)

Returns a dictionary representation of this step:

{
    'type': 'streaming',
    'mapper': { ... },
    'combiner': { ... },
    'reducer': { ... },
    'jobconf': dictionary of Hadoop configuration properties
}

jobconf is optional, and only one of mapper, combiner, and reducer need be included.

mapper, combiner, and reducer are either handled by the script containing your job definition:

{
    'type': 'script',
    'pre_filter': (optional) cmd to pass input through, as a string
}

or they simply run a command:

{
    'type': 'command',
    'command': command to run, as a string
}

See Format of –steps for examples.

Need help?

Join the mailing list by visiting the Google group page or sending an email to mrjob+subscribe@googlegroups.com.