The ML.UNDERSTAND_TEXT function

This document describes the ML.UNDERSTAND_TEXT function, which lets you analyze text that's stored in BigQuery tables by using the Cloud Natural Language API.

Syntax

ML.UNDERSTAND_TEXT(
  MODEL `project_id.dataset.model_name`,
  { TABLE `project_id.dataset.bq_table` | (query_statement) },
  STRUCT('option_name' AS nlu_option)
)

Arguments

ML.UNDERSTAND_TEXT takes the following arguments:

Output

ML.UNDERSTAND_TEXT returns the input table plus the following columns:

Quotas

See Cloud AI service functions quotas and limits.

Known issues

Sometimes after a query job that uses this function finishes successfully, some returned rows contain the following error message:

A retryable error occurred: RESOURCE EXHAUSTED error from <remote endpoint>

This issue occurs because BigQuery query jobs finish successfully even if the function fails for some of the rows. The function fails when the volume of API calls to the remote endpoint exceeds the quota limits for that service. This issue occurs most often when you are running multiple parallel batch queries. BigQuery retries these calls, but if the retries fail, the resource exhausted error message is returned.

Locations

ML.UNDERSTAND_TEXT must run in the same region as the remote model that the function references. For more information about supported locations for models based on the Natural Language API, see Locations for remote models.

Examples

Example 1

The following example applies classify_text on the bq table mybqtable in mydataset.

# Create Model
CREATE OR REPLACE MODEL
`myproject.mydataset.mynlpmodel`
REMOTE WITH CONNECTION `myproject.myregion.myconnection`
OPTIONS (remote_service_type ='cloud_ai_natural_language_v1');
# Understand Text
SELECT * FROM ML.UNDERSTAND_TEXT(
  MODEL `mydataset.mynlpmodel`,
  TABLE `mydataset.mybqtable`,
  STRUCT('classify_text' AS nlu_option)
);

The output is similar to the following:

ml_understand_text_result ml_understand_text_status text_content
{"categories":[{"confidence":0.51999998,"name":"/Arts & Entertainment/TV & Video/TV Shows & Programs"}]} That actor on TV makes movies in Hollywood and also stars in a variety of popular new TV shows.

Example 2

The following example classify the text in the column text_content in the table mybqtable, selects the rows where confidence is higher than 0.5, and then returns the results in separate columns.

CREATE TABLE
  `mydataset.classfied_result` AS (
  SELECT
    text_content AS `Original Input`,
    STRING(ml_understand_text_result.categories[0].name) AS `Classified Name`,
    FLOAT64(ml_understand_text_result.categories[0].confidence) AS `Confidence`,
    ml_understand_text_status AS `Status`
  FROM
    ML.UNDERSTAND_TEXT( MODEL `mydataset.mynlpmodel`,
      TABLE `mydataset.mybqtable`,
      STRUCT('classify_text' AS nlu_option))
  );

SELECT
  *
FROM
  `mydataset.classfied_result`
WHERE
  confidence > 0.5;

The output is similar to the following:

Original Input Classified Name Confidence Status
That actor on TV makes movies in Hollywood and also stars in a variety of popular new TV shows. /Arts & Entertainment/TV & Video/TV Shows & Programs 0.51999998

If you get an error like query limit exceeded, you might have exceeded the quota for this function, which can leave you with unprocessed rows. Use the following query to complete processing the unprocessed rows:

CREATE TABLE
  `mydataset.classfied_result_next` AS (
  SELECT
    text_content AS `Original Input`,
    STRING(ml_understand_text_result.categories[0].name) AS `Classified Name`,
    FLOAT64(ml_understand_text_result.categories[0].confidence) AS `Confidence`,
    ml_understand_text_status AS `Status`
  FROM
    ML.UNDERSTAND_TEXT( MODEL `mydataset.mynlpmodel`,
      (SELECT `Original Input` as text_content FROM `mydataset.classfied_result`
       WHERE Status != ''),
      STRUCT('classify_text' AS nlu_option))
  );

SELECT * FROM `mydataset.classfied_result_next`;

What's next