App Engine Python SDK  v1.6.9 rev.445
The Python runtime is available as an experimental Preview feature.
Public Member Functions | List of all members
google.appengine.ext.mapreduce.output_writers.OutputWriter Class Reference
Inheritance diagram for google.appengine.ext.mapreduce.output_writers.OutputWriter:
google.appengine.ext.mapreduce.json_util.JsonMixin google.appengine.ext.mapreduce.output_writers._GoogleCloudStorageOutputWriter google.appengine.ext.mapreduce.output_writers.FileOutputWriterBase google.appengine.ext.mapreduce.output_writers._GoogleCloudStorageRecordOutputWriter google.appengine.ext.mapreduce.output_writers.BlobstoreOutputWriterBase google.appengine.ext.mapreduce.output_writers.FileOutputWriter google.appengine.ext.mapreduce.output_writers.FileRecordsOutputWriter google.appengine.ext.mapreduce.output_writers._GoogleCloudStorageKeyValueOutputWriter google.appengine.ext.mapreduce.output_writers.BlobstoreOutputWriter google.appengine.ext.mapreduce.output_writers.BlobstoreRecordsOutputWriter google.appengine.ext.mapreduce.output_writers.KeyValueBlobstoreOutputWriter google.appengine.ext.mapreduce.shuffler._HashingBlobstoreOutputWriter google.appengine.ext.mapreduce.output_writers.BlobstoreOutputWriter google.appengine.ext.mapreduce.output_writers.BlobstoreRecordsOutputWriter google.appengine.ext.mapreduce.output_writers.KeyValueFileOutputWriter

Public Member Functions

def validate
 
def init_job
 
def finalize_job
 
def from_json
 
def to_json
 
def create
 
def write
 
def finalize
 
def get_filenames
 
- Public Member Functions inherited from google.appengine.ext.mapreduce.json_util.JsonMixin
def to_json_str
 
def from_json_str
 

Detailed Description

Abstract base class for output writers.

Output writers process all mapper handler output, which is not
the operation.

OutputWriter's lifecycle is the following:
  0) validate called to validate mapper specification.
  1) init_job is called to initialize any job-level state.
  2) create() is called, which should create a new instance of output
     writer for a given shard
  3) from_json()/to_json() are used to persist writer's state across
     multiple slices.
  4) write() method is called to write data.
  5) finalize() is called when shard processing is done.
  6) finalize_job() is called when job is completed.
  7) get_filenames() is called to get output file names.

Member Function Documentation

def google.appengine.ext.mapreduce.output_writers.OutputWriter.create (   cls,
  mr_spec,
  shard_number,
  shard_attempt,
  _writer_state = None 
)
Create new writer for a shard.

Args:
  mr_spec: an instance of model.MapreduceSpec describing current job.
  shard_number: int shard number.
  shard_attempt: int shard attempt.
  _writer_state: deprecated. This is for old writers that share file
across shards. For new writers, each shard must have its own
dedicated outputs. Output state should be contained in
the output writer instance. The serialized output writer
instance will be saved by mapreduce across slices.
def google.appengine.ext.mapreduce.output_writers.OutputWriter.finalize (   self,
  ctx,
  shard_state 
)
Finalize writer shard-level state.

This should only be called when shard_state.result_status shows success.
After finalizing the outputs, it should save per-shard output file info
into shard_state.writer_state so that other operations can find the
outputs.

Args:
  ctx: an instance of context.Context.
  shard_state: shard state. ShardState.writer_state can be modified.
def google.appengine.ext.mapreduce.output_writers.OutputWriter.finalize_job (   cls,
  mapreduce_state 
)
Finalize job-level writer state.

This method is only to support the deprecated feature which is shared
output files by many shards. New output writers should not do anything
in this method.

This method should only be called when mapreduce_state.result_status shows
success. After finalizing the outputs, it should save the info for shard
shared files into mapreduce_state.writer_state so that other operations
can find the outputs.

Args:
  mapreduce_state: an instance of model.MapreduceState describing current
  job. MapreduceState.writer_state can be modified during finalization.
def google.appengine.ext.mapreduce.output_writers.OutputWriter.from_json (   cls,
  state 
)
Creates an instance of the OutputWriter for the given json state.

Args:
  state: The OutputWriter state as a dict-like object.

Returns:
  An instance of the OutputWriter configured using the values of json.
def google.appengine.ext.mapreduce.output_writers.OutputWriter.get_filenames (   cls,
  mapreduce_state 
)
Obtain output filenames from mapreduce state.

This method should only be called when a MR is finished. Implementors of
this method should not assume any other methods of this class have been
called. In the case of no input data, no other method except validate
would have been called.

Args:
  mapreduce_state: an instance of model.MapreduceState

Returns:
  List of filenames this mapreduce successfully wrote to. The list can be
empty if no output file was successfully written.
def google.appengine.ext.mapreduce.output_writers.OutputWriter.init_job (   cls,
  mapreduce_state 
)
Initialize job-level writer state.

This method is only to support the deprecated feature which is shared
output files by many shards. New output writers should not do anything
in this method.

Args:
  mapreduce_state: an instance of model.MapreduceState describing current
  job. MapreduceState.writer_state can be modified during initialization
  to save the information about the files shared by many shards.
def google.appengine.ext.mapreduce.output_writers.OutputWriter.to_json (   self)
Returns writer state to serialize in json.

Returns:
  A json-izable version of the OutputWriter state.
def google.appengine.ext.mapreduce.output_writers.OutputWriter.validate (   cls,
  mapper_spec 
)
Validates mapper specification.

Output writer parameters are expected to be passed as "output_writer"
subdictionary of mapper_spec.params. To be compatible with previous
API output writer is advised to check mapper_spec.params and issue
a warning if "output_writer" subdicationary is not present.
_get_params helper method can be used to simplify implementation.

Args:
  mapper_spec: an instance of model.MapperSpec to validate.
def google.appengine.ext.mapreduce.output_writers.OutputWriter.write (   self,
  data 
)
Write data.

Args:
  data: actual data yielded from handler. Type is writer-specific.

The documentation for this class was generated from the following file: