App Engine Python SDK  v1.6.9 rev.445
The Python runtime is available as an experimental Preview feature.
Public Member Functions | Static Public Attributes | List of all members
google.appengine.ext.mapreduce.output_writers._GoogleCloudStorageOutputWriter Class Reference
Inheritance diagram for google.appengine.ext.mapreduce.output_writers._GoogleCloudStorageOutputWriter:
google.appengine.ext.mapreduce.output_writers.OutputWriter google.appengine.ext.mapreduce.json_util.JsonMixin google.appengine.ext.mapreduce.output_writers._GoogleCloudStorageRecordOutputWriter google.appengine.ext.mapreduce.output_writers._GoogleCloudStorageKeyValueOutputWriter

Public Member Functions

def __init__
 
def validate
 
def create
 
def get_filenames
 
def from_json
 
def to_json
 
def write
 
def finalize
 
- Public Member Functions inherited from google.appengine.ext.mapreduce.output_writers.OutputWriter
def validate
 
def init_job
 
def finalize_job
 
def from_json
 
def to_json
 
def create
 
def write
 
def finalize
 
def get_filenames
 
- Public Member Functions inherited from google.appengine.ext.mapreduce.json_util.JsonMixin
def to_json_str
 
def from_json_str
 

Static Public Attributes

string BUCKET_NAME_PARAM = "bucket_name"
 
string ACL_PARAM = "acl"
 
string NAMING_FORMAT_PARAM = "naming_format"
 
string CONTENT_TYPE_PARAM = "content_type"
 
string DEFAULT_NAMING_FORMAT = "$name/$id/output-$num"
 

Detailed Description

Output writer to Google Cloud Storage using the cloudstorage library.

This class is expected to be subclassed with a writer that applies formatting
to user-level records.

Required configuration in the mapper_spec.output_writer dictionary.
  BUCKET_NAME_PARAM: name of the bucket to use (with no extra delimiters or
    suffixes such as directories. Directories/prefixes can be specifed as
    part of the NAMING_FORMAT_PARAM).

Optional configuration in the mapper_spec.output_writer dictionary:
  ACL_PARAM: acl to apply to new files, else bucket default used.
  NAMING_FORMAT_PARAM: prefix format string for the new files (there is no
    required starting slash, expected formats would look like
    "directory/basename...", any starting slash will be treated as part of
    the file name) that should use the following substitutions:
      $name - the name of the job
      $id - the id assigned to the job
      $num - the shard number
    If there is more than one shard $num must be used. An arbitrary suffix may
    be applied by the writer.
  CONTENT_TYPE_PARAM: mime type to apply on the files. If not provided, Google
    Cloud Storage will apply its default.
  _NO_DUPLICATE: if True, slice recovery logic will be used to ensure
    output files has no duplicates. Every shard should have only one final
    output in user specified location. But it may produce many smaller
    files (named "seg") due to slice recovery. These segs live in a
    tmp directory and should be combined and renamed to the final location.
    In current impl, they are not combined.

Constructor & Destructor Documentation

def google.appengine.ext.mapreduce.output_writers._GoogleCloudStorageOutputWriter.__init__ (   self,
  streaming_buffer,
  writer_spec = None 
)
Initialize a GoogleCloudStorageOutputWriter instance.

Args:
  streaming_buffer: an instance of writable buffer from cloudstorage_api.

  writer_spec: the specification for the writer.

Member Function Documentation

def google.appengine.ext.mapreduce.output_writers._GoogleCloudStorageOutputWriter.create (   cls,
  mr_spec,
  shard_number,
  shard_attempt,
  _writer_state = None 
)
Inherit docs.
def google.appengine.ext.mapreduce.output_writers._GoogleCloudStorageOutputWriter.validate (   cls,
  mapper_spec 
)
Validate mapper specification.

Args:
  mapper_spec: an instance of model.MapperSpec.

Raises:
  BadWriterParamsError: if the specification is invalid for any reason such
as missing the bucket name or providing an invalid bucket name.
def google.appengine.ext.mapreduce.output_writers._GoogleCloudStorageOutputWriter.write (   self,
  data 
)
Write data to the GoogleCloudStorage file.

Args:
  data: string containing the data to be written.

The documentation for this class was generated from the following file: