Reader for BigQuery Storage API#
-
class
google.cloud.bigquery_storage_v1beta1.reader.
ReadRowsIterable
(reader, read_session)[source]# An iterable of rows from a read session.
- Parameters
reader (google.cloud.bigquery_storage_v1beta1.reader.ReadRowsStream) – A read rows stream.
read_session (google.cloud.bigquery_storage_v1beta1.types.ReadSession) – A read session. This is required because it contains the schema used in the stream messages.
-
property
pages
# A generator of all pages in the stream.
- Returns
A generator of pages.
- Return type
types.GeneratorType[google.cloud.bigquery_storage_v1beta1.ReadRowsPage]
-
to_arrow
()[source]# Create a
pyarrow.Table
of all rows in the stream.This method requires the pyarrow library and a stream using the Arrow format.
- Returns
A table of all rows in the stream.
- Return type
pyarrow.Table
-
to_dataframe
(dtypes=None)[source]# Create a
pandas.DataFrame
of all rows in the stream.This method requires the pandas libary to create a data frame and the fastavro library to parse row messages.
Warning
DATETIME columns are not supported. They are currently parsed as strings in the fastavro library.
- Parameters
dtypes (Map[str, Union[str, pandas.Series.dtype]]) – Optional. A dictionary of column names pandas
dtype``s. The provided ``dtype
is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.- Returns
A data frame of all rows in the stream.
- Return type
-
class
google.cloud.bigquery_storage_v1beta1.reader.
ReadRowsPage
(stream_parser, message)[source]# An iterator of rows from a read session message.
- Parameters
stream_parser (google.cloud.bigquery_storage_v1beta1.reader._StreamParser) – A helper for parsing messages into rows.
message (google.cloud.bigquery_storage_v1beta1.types.ReadRowsResponse) – A message of data from a read rows stream.
-
to_arrow
()[source]# Create an
pyarrow.RecordBatch
of rows in the page.- Returns
Rows from the message, as an Arrow record batch.
- Return type
pyarrow.RecordBatch
-
to_dataframe
(dtypes=None)[source]# Create a
pandas.DataFrame
of rows in the page.This method requires the pandas libary to create a data frame and the fastavro library to parse row messages.
Warning
DATETIME columns are not supported. They are currently parsed as strings in the fastavro library.
- Parameters
dtypes (Map[str, Union[str, pandas.Series.dtype]]) – Optional. A dictionary of column names pandas
dtype``s. The provided ``dtype
is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.- Returns
A data frame of all rows in the stream.
- Return type
-
class
google.cloud.bigquery_storage_v1beta1.reader.
ReadRowsStream
(wrapped, client, read_position, read_rows_kwargs)[source]# A stream of results from a read rows request.
This stream is an iterable of
ReadRowsResponse
. Iterate over it to fetch all row messages.If the fastavro library is installed, use the
rows()
method to parse all messages into a stream of row dictionaries.If the pandas and fastavro libraries are installed, use the
to_dataframe()
method to parse all messages into apandas.DataFrame
.Construct a ReadRowsStream.
- Parameters
wrapped (Iterable[ ReadRowsResponse ]) – The ReadRows stream to read.
client (BigQueryStorageClient) – A GAPIC client used to reconnect to a ReadRows stream. This must be the GAPIC client to avoid a circular dependency on this class.
read_position (Union[ dict, StreamPosition ]) – Required. Identifier of the position in the stream to start reading from. The offset requested must be less than the last row read from ReadRows. Requesting a larger offset is undefined. If a dict is provided, it must be of the same form as the protobuf message
StreamPosition
read_rows_kwargs (dict) – Keyword arguments to use when reconnecting to a ReadRows stream.
- Returns
A sequence of row messages.
- Return type
Iterable[ ReadRowsResponse ]
-
rows
(read_session)[source]# Iterate over all rows in the stream.
This method requires the fastavro library in order to parse row messages.
Warning
DATETIME columns are not supported. They are currently parsed as strings in the fastavro library.
- Parameters
read_session (ReadSession) – The read session associated with this read rows stream. This contains the schema, which is required to parse the data messages.
- Returns
A sequence of rows, represented as dictionaries.
- Return type
Iterable[Mapping]
-
to_arrow
(read_session)[source]# Create a
pyarrow.Table
of all rows in the stream.This method requires the pyarrow library and a stream using the Arrow format.
- Parameters
read_session (ReadSession) – The read session associated with this read rows stream. This contains the schema, which is required to parse the data messages.
- Returns
A table of all rows in the stream.
- Return type
pyarrow.Table
-
to_dataframe
(read_session, dtypes=None)[source]# Create a
pandas.DataFrame
of all rows in the stream.This method requires the pandas libary to create a data frame and the fastavro library to parse row messages.
Warning
DATETIME columns are not supported. They are currently parsed as strings.
- Parameters
read_session (ReadSession) – The read session associated with this read rows stream. This contains the schema, which is required to parse the data messages.
dtypes (Map[str, Union[str, pandas.Series.dtype]]) – Optional. A dictionary of column names pandas
dtype``s. The provided ``dtype
is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.
- Returns
A data frame of all rows in the stream.
- Return type