![]() |
App Engine Python SDK
v1.6.9 rev.445
The Python runtime is available as an experimental Preview feature.
|
Public Member Functions | |
def | __init__ |
def | validate |
def | split_input |
def | from_json |
def | to_json |
def | next |
def | __str__ |
def | params_to_json |
def | params_from_json |
Input reader from Google Cloud Storage using the cloudstorage library. Required configuration in the mapper_spec.input_reader dictionary. BUCKET_NAME_PARAM: name of the bucket to use. No "/" prefix or suffix. OBJECT_NAMES_PARAM: a list of object names or prefixes. All objects must be in the BUCKET_NAME_PARAM bucket. If the name ends with a * it will be treated as prefix and all objects with matching names will be read. Entries should not start with a slash unless that is part of the object's name. An example list could be: ["my-1st-input-file", "directory/my-2nd-file", "some/other/dir/input-*"] To retrieve all files "*" will match every object in the bucket. If a file is listed twice or is covered by multiple prefixes it will be read twice, there is no de-duplication. Optional configuration in the mapper_sec.input_reader dictionary. BUFFER_SIZE_PARAM: the size of the read buffer for each file handle. PATH_FILTER_PARAM: an instance of PathFilter. PathFilter is a predicate on which filenames to read. DELIMITER_PARAM: str. The delimiter that signifies directory. If you have too many files to shard on the granularity of individual files, you can specify this to enable shallow splitting. In this mode, the reader only goes one level deep during "*" expansion and stops when the delimiter is encountered.
def google.appengine.ext.mapreduce.lib.input_reader._gcs.GCSInputReader.__init__ | ( | self, | |
filenames, | |||
index = 0 , |
|||
buffer_size = None , |
|||
_account_id = None , |
|||
delimiter = None , |
|||
path_filter = None |
|||
) |
Initialize a GoogleCloudStorageInputReader instance. Args: filenames: A list of Google Cloud Storage filenames of the form '/bucket/objectname'. index: Index of the next filename to read. buffer_size: The size of the read buffer, None to use default. _account_id: Internal use only. See cloudstorage documentation. delimiter: Delimiter used as path separator. See class doc. path_filter: An instance of PathFilter.
def google.appengine.ext.mapreduce.lib.input_reader._gcs.GCSInputReader.next | ( | self | ) |
Returns a handler to the next file. Non existent files will be logged and skipped. The file might have been removed after input splitting. Returns: The next input from this input reader in the form of a cloudstorage ReadBuffer that supports a File-like interface (read, readline, seek, tell, and close). An error may be raised if the file can not be opened. Raises: StopIteration: The list of files has been exhausted.
def google.appengine.ext.mapreduce.lib.input_reader._gcs.GCSInputReader.params_to_json | ( | cls, | |
params | |||
) |
Inherit docs.
def google.appengine.ext.mapreduce.lib.input_reader._gcs.GCSInputReader.split_input | ( | cls, | |
job_config | |||
) |
Returns a list of input readers. An equal number of input files are assigned to each shard (+/- 1). If there are fewer files than shards, fewer than the requested number of shards will be used. Input files are currently never split (although for some formats could be and may be split in a future implementation). Args: job_config: map_job.JobConfig Returns: A list of InputReaders. None when no input data can be found.
def google.appengine.ext.mapreduce.lib.input_reader._gcs.GCSInputReader.validate | ( | cls, | |
job_config | |||
) |
Validate mapper specification. Args: job_config: map_job.JobConfig. Raises: BadReaderParamsError: if the specification is invalid for any reason such as missing the bucket name or providing an invalid bucket name.