![]() |
App Engine Python SDK
v1.6.9 rev.445
The Python runtime is available as an experimental Preview feature.
|
Public Member Functions | |
def | validate |
def | init_job |
def | finalize_job |
def | from_json |
def | to_json |
def | create |
def | write |
def | finalize |
def | get_filenames |
![]() | |
def | to_json_str |
def | from_json_str |
Abstract base class for output writers. Output writers process all mapper handler output, which is not the operation. OutputWriter's lifecycle is the following: 0) validate called to validate mapper specification. 1) init_job is called to initialize any job-level state. 2) create() is called, which should create a new instance of output writer for a given shard 3) from_json()/to_json() are used to persist writer's state across multiple slices. 4) write() method is called to write data. 5) finalize() is called when shard processing is done. 6) finalize_job() is called when job is completed. 7) get_filenames() is called to get output file names.
def google.appengine.ext.mapreduce.output_writers.OutputWriter.create | ( | cls, | |
mr_spec, | |||
shard_number, | |||
shard_attempt, | |||
_writer_state = None |
|||
) |
Create new writer for a shard. Args: mr_spec: an instance of model.MapreduceSpec describing current job. shard_number: int shard number. shard_attempt: int shard attempt. _writer_state: deprecated. This is for old writers that share file across shards. For new writers, each shard must have its own dedicated outputs. Output state should be contained in the output writer instance. The serialized output writer instance will be saved by mapreduce across slices.
def google.appengine.ext.mapreduce.output_writers.OutputWriter.finalize | ( | self, | |
ctx, | |||
shard_state | |||
) |
Finalize writer shard-level state. This should only be called when shard_state.result_status shows success. After finalizing the outputs, it should save per-shard output file info into shard_state.writer_state so that other operations can find the outputs. Args: ctx: an instance of context.Context. shard_state: shard state. ShardState.writer_state can be modified.
def google.appengine.ext.mapreduce.output_writers.OutputWriter.finalize_job | ( | cls, | |
mapreduce_state | |||
) |
Finalize job-level writer state. This method is only to support the deprecated feature which is shared output files by many shards. New output writers should not do anything in this method. This method should only be called when mapreduce_state.result_status shows success. After finalizing the outputs, it should save the info for shard shared files into mapreduce_state.writer_state so that other operations can find the outputs. Args: mapreduce_state: an instance of model.MapreduceState describing current job. MapreduceState.writer_state can be modified during finalization.
def google.appengine.ext.mapreduce.output_writers.OutputWriter.from_json | ( | cls, | |
state | |||
) |
Creates an instance of the OutputWriter for the given json state. Args: state: The OutputWriter state as a dict-like object. Returns: An instance of the OutputWriter configured using the values of json.
def google.appengine.ext.mapreduce.output_writers.OutputWriter.get_filenames | ( | cls, | |
mapreduce_state | |||
) |
Obtain output filenames from mapreduce state. This method should only be called when a MR is finished. Implementors of this method should not assume any other methods of this class have been called. In the case of no input data, no other method except validate would have been called. Args: mapreduce_state: an instance of model.MapreduceState Returns: List of filenames this mapreduce successfully wrote to. The list can be empty if no output file was successfully written.
def google.appengine.ext.mapreduce.output_writers.OutputWriter.init_job | ( | cls, | |
mapreduce_state | |||
) |
Initialize job-level writer state. This method is only to support the deprecated feature which is shared output files by many shards. New output writers should not do anything in this method. Args: mapreduce_state: an instance of model.MapreduceState describing current job. MapreduceState.writer_state can be modified during initialization to save the information about the files shared by many shards.
def google.appengine.ext.mapreduce.output_writers.OutputWriter.to_json | ( | self | ) |
Returns writer state to serialize in json. Returns: A json-izable version of the OutputWriter state.
def google.appengine.ext.mapreduce.output_writers.OutputWriter.validate | ( | cls, | |
mapper_spec | |||
) |
Validates mapper specification. Output writer parameters are expected to be passed as "output_writer" subdictionary of mapper_spec.params. To be compatible with previous API output writer is advised to check mapper_spec.params and issue a warning if "output_writer" subdicationary is not present. _get_params helper method can be used to simplify implementation. Args: mapper_spec: an instance of model.MapperSpec to validate.
def google.appengine.ext.mapreduce.output_writers.OutputWriter.write | ( | self, | |
data | |||
) |
Write data. Args: data: actual data yielded from handler. Type is writer-specific.