mrjob.protocol - input and output

Protocols deserialize and serialize the input and output of tasks to raw bytes for Hadoop to distribute to the next task or to write as output. For more information, see Protocols and Writing custom protocols.

class mrjob.protocol.JSONProtocol

Encode (key, value) as two JSONs separated by a tab.

Note that JSON has some limitations; dictionary keys must be strings, and there’s no distinction between lists and tuples.

class mrjob.protocol.JSONValueProtocol

Encode value as a JSON and discard key (key is read in as None).

class mrjob.protocol.PickleProtocol

Encode (key, value) as two string-escaped pickles separated by a tab.

We string-escape the pickles to avoid having to deal with stray \t and \n characters, which would confuse Hadoop Streaming.

Ugly, but should work for any type.

class mrjob.protocol.PickleValueProtocol

Encode value as a string-escaped pickle and discard key (key is read in as None).

class mrjob.protocol.RawProtocol

Encode (key, value) as key and value separated by a tab (key and value should be bytestrings).

If key or value is None, don’t include a tab. When decoding a line with no tab in it, value will be None.

When reading from a line with multiple tabs, we break on the first one.

Your key should probably not be None or have tab characters in it, but we don’t check.

class mrjob.protocol.RawValueProtocol

Read in a line as (None, line). Write out (key, value) as value. value must be a str.

The default way for a job to read its initial input.

class mrjob.protocol.ReprProtocol

Encode (key, value) as two reprs separated by a tab.

This only works for basic types (we use mrjob.util.safeeval()).

class mrjob.protocol.ReprValueProtocol

Encode value as a repr and discard key (key is read in as None).

This only works for basic types (we use mrjob.util.safeeval()).

Need help?

Join the mailing list by visiting the Google group page or sending an email to mrjob+subscribe@googlegroups.com.