Output writer to Google Cloud Storage using the cloudstorage library.
This class is expected to be subclassed with a writer that applies formatting
to user-level records.
Required configuration in the mapper_spec.output_writer dictionary.
BUCKET_NAME_PARAM: name of the bucket to use (with no extra delimiters or
suffixes such as directories. Directories/prefixes can be specifed as
part of the NAMING_FORMAT_PARAM).
Optional configuration in the mapper_spec.output_writer dictionary:
ACL_PARAM: acl to apply to new files, else bucket default used.
NAMING_FORMAT_PARAM: prefix format string for the new files (there is no
required starting slash, expected formats would look like
"directory/basename...", any starting slash will be treated as part of
the file name) that should use the following substitutions:
$name - the name of the job
$id - the id assigned to the job
$num - the shard number
If there is more than one shard $num must be used. An arbitrary suffix may
be applied by the writer.
CONTENT_TYPE_PARAM: mime type to apply on the files. If not provided, Google
Cloud Storage will apply its default.
_NO_DUPLICATE: if True, slice recovery logic will be used to ensure
output files has no duplicates. Every shard should have only one final
output in user specified location. But it may produce many smaller
files (named "seg") due to slice recovery. These segs live in a
tmp directory and should be combined and renamed to the final location.
In current impl, they are not combined.