Reader for BigQuery Storage API#

class google.cloud.bigquery_storage_v1beta1.reader.ReadRowsIterable(reader, read_session)[source]#

An iterable of rows from a read session.

Parameters
property pages#

A generator of all pages in the stream.

Returns

A generator of pages.

Return type

types.GeneratorType[google.cloud.bigquery_storage_v1beta1.ReadRowsPage]

to_arrow()[source]#

Create a pyarrow.Table of all rows in the stream.

This method requires the pyarrow library and a stream using the Arrow format.

Returns

A table of all rows in the stream.

Return type

pyarrow.Table

to_dataframe(dtypes=None)[source]#

Create a pandas.DataFrame of all rows in the stream.

This method requires the pandas libary to create a data frame and the fastavro library to parse row messages.

Warning

DATETIME columns are not supported. They are currently parsed as strings in the fastavro library.

Parameters

dtypes (Map[str, Union[str, pandas.Series.dtype]]) – Optional. A dictionary of column names pandas dtype``s. The provided ``dtype is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.

Returns

A data frame of all rows in the stream.

Return type

pandas.DataFrame

property total_rows#

Number of estimated rows in the current stream.

May change over time.

Type

int

class google.cloud.bigquery_storage_v1beta1.reader.ReadRowsPage(stream_parser, message)[source]#

An iterator of rows from a read session message.

Parameters
next()[source]#

Get the next row in the page.

property num_items#

Total items in the page.

Type

int

property remaining#

Remaining items in the page.

Type

int

to_arrow()[source]#

Create an pyarrow.RecordBatch of rows in the page.

Returns

Rows from the message, as an Arrow record batch.

Return type

pyarrow.RecordBatch

to_dataframe(dtypes=None)[source]#

Create a pandas.DataFrame of rows in the page.

This method requires the pandas libary to create a data frame and the fastavro library to parse row messages.

Warning

DATETIME columns are not supported. They are currently parsed as strings in the fastavro library.

Parameters

dtypes (Map[str, Union[str, pandas.Series.dtype]]) – Optional. A dictionary of column names pandas dtype``s. The provided ``dtype is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.

Returns

A data frame of all rows in the stream.

Return type

pandas.DataFrame

class google.cloud.bigquery_storage_v1beta1.reader.ReadRowsStream(wrapped, client, read_position, read_rows_kwargs)[source]#

A stream of results from a read rows request.

This stream is an iterable of ReadRowsResponse. Iterate over it to fetch all row messages.

If the fastavro library is installed, use the rows() method to parse all messages into a stream of row dictionaries.

If the pandas and fastavro libraries are installed, use the to_dataframe() method to parse all messages into a pandas.DataFrame.

Construct a ReadRowsStream.

Parameters
  • wrapped (Iterable[ ReadRowsResponse ]) – The ReadRows stream to read.

  • client (BigQueryStorageClient) – A GAPIC client used to reconnect to a ReadRows stream. This must be the GAPIC client to avoid a circular dependency on this class.

  • read_position (Union[ dict, StreamPosition ]) – Required. Identifier of the position in the stream to start reading from. The offset requested must be less than the last row read from ReadRows. Requesting a larger offset is undefined. If a dict is provided, it must be of the same form as the protobuf message StreamPosition

  • read_rows_kwargs (dict) – Keyword arguments to use when reconnecting to a ReadRows stream.

Returns

A sequence of row messages.

Return type

Iterable[ ReadRowsResponse ]

rows(read_session)[source]#

Iterate over all rows in the stream.

This method requires the fastavro library in order to parse row messages.

Warning

DATETIME columns are not supported. They are currently parsed as strings in the fastavro library.

Parameters

read_session (ReadSession) – The read session associated with this read rows stream. This contains the schema, which is required to parse the data messages.

Returns

A sequence of rows, represented as dictionaries.

Return type

Iterable[Mapping]

to_arrow(read_session)[source]#

Create a pyarrow.Table of all rows in the stream.

This method requires the pyarrow library and a stream using the Arrow format.

Parameters

read_session (ReadSession) – The read session associated with this read rows stream. This contains the schema, which is required to parse the data messages.

Returns

A table of all rows in the stream.

Return type

pyarrow.Table

to_dataframe(read_session, dtypes=None)[source]#

Create a pandas.DataFrame of all rows in the stream.

This method requires the pandas libary to create a data frame and the fastavro library to parse row messages.

Warning

DATETIME columns are not supported. They are currently parsed as strings.

Parameters
  • read_session (ReadSession) – The read session associated with this read rows stream. This contains the schema, which is required to parse the data messages.

  • dtypes (Map[str, Union[str, pandas.Series.dtype]]) – Optional. A dictionary of column names pandas dtype``s. The provided ``dtype is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.

Returns

A data frame of all rows in the stream.

Return type

pandas.DataFrame