Represents a running a custom Jar as a step.
If this is passed as one of the step’s arguments, it’ll be replaced with the step’s input paths (if there are multiple paths, they’ll be joined with commas)
If this is passed as one of the step’s arguments, it’ll be replaced with the step’s output path
Returns a dictionary representation of this step:
{
'type': 'jar',
'jar': path of the jar,
'main_class': string, name of the main class,
'args': list of strings, args to the main class,
}
See Format of –steps for examples.
Represents steps handled by the script containing your job.
Returns a dictionary representation of this step:
{
'type': 'streaming',
'mapper': { ... },
'combiner': { ... },
'reducer': { ... },
'jobconf': dictionary of Hadoop configuration properties
}
jobconf is optional, and only one of mapper, combiner, and reducer need be included.
mapper, combiner, and reducer are either handled by the script containing your job definition:
{
'type': 'script',
'pre_filter': (optional) cmd to pass input through, as a string
}
or they simply run a command:
{
'type': 'command',
'command': command to run, as a string
}
See Format of –steps for examples.