The ML.PROCESS_DOCUMENT function

This document describes the ML.PROCESS_DOCUMENT function, which lets you process unstructured documents from an object table by using the Document AI API.

Syntax

ML.PROCESS_DOCUMENT(
  MODEL `project_id.dataset.model_name`,
  TABLE `project_id.dataset.object_table`
)

Arguments

ML.PROCESS_DOCUMENT takes the following arguments:

Output

ML.PROCESS_DOCUMENT returns the following columns:

Quotas

See Cloud AI service functions quotas and limits.

Known issues

Sometimes after a query job that uses this function finishes successfully, some returned rows contain the following error message:

A retryable error occurred: RESOURCE EXHAUSTED error from <remote endpoint>

This issue occurs because BigQuery query jobs finish successfully even if the function fails for some of the rows. The function fails when the volume of API calls to the remote endpoint exceeds the quota limits for that service. This issue occurs most often when you are running multiple parallel batch queries. BigQuery retries these calls, but if the retries fail, the resource exhausted error message is returned.

Locations

ML.PROCESS_DOCUMENT must run in the same region as the remote model that the function references. You can only create models based on Document AI in the US and EU multi-regions.

Limitations

The function can't process documents with more than 15 pages. Any row that contains such a file returns an error.

Example

The following example uses the invoice parser to process the documents represented by the documents table.

Create the model:

# Create model
CREATE OR REPLACE MODEL
`myproject.mydataset.invoice_parser`
REMOTE WITH CONNECTION `myproject.myregion.myconnection`
OPTIONS (remote_service_type = 'cloud_ai_document_v1',
document_processor='projects/project_number/locations/processor_location/processors/processor_id/processorVersions/version_id');

Process the documents:

SELECT *
FROM ML.PROCESS_DOCUMENT(
  MODEL `myproject.mydataset.invoice_parser`,
  TABLE `myproject.mydataset.documents`
);

The result is similar to the following:

ml_process_document_result ml_process_document_status invoice_type currency ...
{"entities":[{"confidence":1,"id":"0","mentionText":"10 105,93 10,59","pageAnchor":{"pageRefs":[{"boundingPoly":{"normalizedVertices":[{"x":0.40452111,"y":0.67199326},{"x":0.74776918,"y":0.67199326},{"x":0.74776918,"y":0.68208581},{"x":0.40452111,"y":0.68208581}]}}]},"properties":[{"confidence":0.66... USD

What's next