App Engine Python SDK  v1.6.9 rev.445
The Python runtime is available as an experimental Preview feature.
Public Member Functions | Static Public Attributes | List of all members
google.appengine.ext.mapreduce.lib.input_reader._gcs.GCSInputReader Class Reference
Inheritance diagram for google.appengine.ext.mapreduce.lib.input_reader._gcs.GCSInputReader:
google.appengine.ext.mapreduce.lib.input_reader._gcs.GCSRecordInputReader

Public Member Functions

def __init__
 
def validate
 
def split_input
 
def from_json
 
def to_json
 
def next
 
def __str__
 
def params_to_json
 
def params_from_json
 

Static Public Attributes

string COUNTER_FILE_READ = "file-read"
 
string COUNTER_FILE_MISSING = "file-missing"
 
string BUCKET_NAME_PARAM = "bucket_name"
 
string OBJECT_NAMES_PARAM = "objects"
 
string BUFFER_SIZE_PARAM = "buffer_size"
 
string DELIMITER_PARAM = "delimiter"
 
string PATH_FILTER_PARAM = "path_filter"
 

Detailed Description

Input reader from Google Cloud Storage using the cloudstorage library.

Required configuration in the mapper_spec.input_reader dictionary.
  BUCKET_NAME_PARAM: name of the bucket to use. No "/" prefix or suffix.
  OBJECT_NAMES_PARAM: a list of object names or prefixes. All objects must be
    in the BUCKET_NAME_PARAM bucket. If the name ends with a * it will be
    treated as prefix and all objects with matching names will be read.
    Entries should not start with a slash unless that is part of the object's
    name. An example list could be:
    ["my-1st-input-file", "directory/my-2nd-file", "some/other/dir/input-*"]
    To retrieve all files "*" will match every object in the bucket. If a file
    is listed twice or is covered by multiple prefixes it will be read twice,
    there is no de-duplication.

Optional configuration in the mapper_sec.input_reader dictionary.
  BUFFER_SIZE_PARAM: the size of the read buffer for each file handle.
  PATH_FILTER_PARAM: an instance of PathFilter. PathFilter is a predicate
    on which filenames to read.
  DELIMITER_PARAM: str. The delimiter that signifies directory.
    If you have too many files to shard on the granularity of individual
    files, you can specify this to enable shallow splitting. In this mode,
    the reader only goes one level deep during "*" expansion and stops when
    the delimiter is encountered.

Constructor & Destructor Documentation

def google.appengine.ext.mapreduce.lib.input_reader._gcs.GCSInputReader.__init__ (   self,
  filenames,
  index = 0,
  buffer_size = None,
  _account_id = None,
  delimiter = None,
  path_filter = None 
)
Initialize a GoogleCloudStorageInputReader instance.

Args:
  filenames: A list of Google Cloud Storage filenames of the form
'/bucket/objectname'.
  index: Index of the next filename to read.
  buffer_size: The size of the read buffer, None to use default.
  _account_id: Internal use only. See cloudstorage documentation.
  delimiter: Delimiter used as path separator. See class doc.
  path_filter: An instance of PathFilter.

Member Function Documentation

def google.appengine.ext.mapreduce.lib.input_reader._gcs.GCSInputReader.next (   self)
Returns a handler to the next file.

Non existent files will be logged and skipped. The file might have been
removed after input splitting.

Returns:
  The next input from this input reader in the form of a cloudstorage
  ReadBuffer that supports a File-like interface (read, readline, seek,
  tell, and close). An error may be raised if the file can not be opened.

Raises:
  StopIteration: The list of files has been exhausted.
def google.appengine.ext.mapreduce.lib.input_reader._gcs.GCSInputReader.params_to_json (   cls,
  params 
)
Inherit docs.
def google.appengine.ext.mapreduce.lib.input_reader._gcs.GCSInputReader.split_input (   cls,
  job_config 
)
Returns a list of input readers.

An equal number of input files are assigned to each shard (+/- 1). If there
are fewer files than shards, fewer than the requested number of shards will
be used. Input files are currently never split (although for some formats
could be and may be split in a future implementation).

Args:
  job_config: map_job.JobConfig

Returns:
  A list of InputReaders. None when no input data can be found.
def google.appengine.ext.mapreduce.lib.input_reader._gcs.GCSInputReader.validate (   cls,
  job_config 
)
Validate mapper specification.

Args:
  job_config: map_job.JobConfig.

Raises:
  BadReaderParamsError: if the specification is invalid for any reason such
as missing the bucket name or providing an invalid bucket name.

The documentation for this class was generated from the following file: