![]() |
App Engine Python SDK
v1.6.9 rev.445
The Python runtime is available as an experimental Preview feature.
|
Public Member Functions | |
def | __init__ |
def | validate |
def | split_input |
def | from_json |
def | to_json |
def | next |
def | __str__ |
![]() | |
def | __iter__ |
def | next |
def | from_json |
def | to_json |
def | split_input |
def | validate |
![]() | |
def | to_json_str |
def | from_json_str |
Static Public Attributes | |
string | BUCKET_NAME_PARAM = "bucket_name" |
string | OBJECT_NAMES_PARAM = "objects" |
string | BUFFER_SIZE_PARAM = "buffer_size" |
string | DELIMITER_PARAM = "delimiter" |
![]() | |
expand_parameters = False | |
string | NAMESPACE_PARAM = "namespace" |
string | NAMESPACES_PARAM = "namespaces" |
Input reader from Google Cloud Storage using the cloudstorage library. This class is expected to be subclassed with a reader that understands user-level records. Required configuration in the mapper_spec.input_reader dictionary. BUCKET_NAME_PARAM: name of the bucket to use (with no extra delimiters or suffixed such as directories. OBJECT_NAMES_PARAM: a list of object names or prefixes. All objects must be in the BUCKET_NAME_PARAM bucket. If the name ends with a * it will be treated as prefix and all objects with matching names will be read. Entries should not start with a slash unless that is part of the object's name. An example list could be: ["my-1st-input-file", "directory/my-2nd-file", "some/other/dir/input-*"] To retrieve all files "*" will match every object in the bucket. If a file is listed twice or is covered by multiple prefixes it will be read twice, there is no deduplication. Optional configuration in the mapper_sec.input_reader dictionary. BUFFER_SIZE_PARAM: the size of the read buffer for each file handle. DELIMITER_PARAM: if specified, turn on the shallow splitting mode. The delimiter is used as a path separator to designate directory hierarchy. Matching of prefixes from OBJECT_NAME_PARAM will stop at the first directory instead of matching all files under the directory. This allows MR to process bucket with hundreds of thousands of files.
def google.appengine.ext.mapreduce.input_readers._GoogleCloudStorageInputReader.__init__ | ( | self, | |
filenames, | |||
index = 0 , |
|||
buffer_size = None , |
|||
_account_id = None , |
|||
delimiter = None |
|||
) |
Initialize a GoogleCloudStorageInputReader instance. Args: filenames: A list of Google Cloud Storage filenames of the form '/bucket/objectname'. index: Index of the next filename to read. buffer_size: The size of the read buffer, None to use default. _account_id: Internal use only. See cloudstorage documentation. delimiter: Delimiter used as path separator. See class doc for details.
def google.appengine.ext.mapreduce.input_readers._GoogleCloudStorageInputReader.next | ( | self | ) |
Returns the next input from this input reader, a block of bytes. Non existent files will be logged and skipped. The file might have been removed after input splitting. Returns: The next input from this input reader in the form of a cloudstorage ReadBuffer that supports a File-like interface (read, readline, seek, tell, and close). An error may be raised if the file can not be opened. Raises: StopIteration: The list of files has been exhausted.
def google.appengine.ext.mapreduce.input_readers._GoogleCloudStorageInputReader.split_input | ( | cls, | |
mapper_spec | |||
) |
Returns a list of input readers. An equal number of input files are assigned to each shard (+/- 1). If there are fewer files than shards, fewer than the requested number of shards will be used. Input files are currently never split (although for some formats could be and may be split in a future implementation). Args: mapper_spec: an instance of model.MapperSpec. Returns: A list of InputReaders. None when no input data can be found.
def google.appengine.ext.mapreduce.input_readers._GoogleCloudStorageInputReader.validate | ( | cls, | |
mapper_spec | |||
) |
Validate mapper specification. Args: mapper_spec: an instance of model.MapperSpec Raises: BadReaderParamsError: if the specification is invalid for any reason such as missing the bucket name or providing an invalid bucket name.