Search functions

GoogleSQL for BigQuery supports the following search functions.

Function list

Name Summary
SEARCH Checks to see whether a table or other search data contains a set of search terms.
VECTOR_SEARCH Performs a vector search on embeddings to find semantically similar entities.
SEARCH(
  data_to_search, search_query
  [, json_scope=>{ 'JSON_VALUES' | 'JSON_KEYS' | 'JSON_KEYS_AND_VALUES' }]
  [, analyzer=>{ 'LOG_ANALYZER' | 'NO_OP_ANALYZER' | 'PATTERN_ANALYZER'}]
  [, analyzer_options=>analyzer_options_values]
)

Description

The SEARCH function checks to see whether a BigQuery table or other search data contains a set of search terms (tokens). It returns TRUE if all search terms appear in the data, based on the text analysis described in the text analyzer, and FALSE otherwise.

Definitions

Details

The SEARCH function is designed to work with search indexes to optimize point lookups. Although the SEARCH function works for tables that aren't indexed, its performance will be greatly improved with a search index. If both the analyzer and analyzer options match the one used to create the index, the search index will be used.

Rules for search_query

Backtick rules for search_query:

Reserved character rules for search_query:

How search_query is broken into searchable terms

The following table shows how search_query is broken into searchable terms by the LOG_ANALYZER text analyzer. All entries are strings.

search_query searchable terms
127.0.0.1 127
0
1
127.0.0.1
. 127.0.0
127.0
0.0
0.0.1
0.1
foobar@example.com foobar
example
com
foobar@example
example.com
foobar@example.com
The fox. the
fox
The
The fox
The fox.
fox
fox.

The following table shows how search_query is broken into query terms by the LOG_ANALYZER text analyzer. All entries are strings.

search_query query terms
127.0.0.1 127
0
1
`127.0.0.1` 127.0.0.1
foobar@example.com foobar
example
com
`foobar@example.com` foobar@example.com

Rules for data_to_search

General rules for data_to_search:

Return type

BOOL

Examples

The following queries show how tokens in search_query are analyzed by a SEARCH function call using the default analyzer, LOG_ANALYZER:

SELECT
  -- ERROR: `search_query` is NULL.
  SEARCH('foobarexample', NULL) AS a,

  -- ERROR: `search_query` contains no tokens.
  SEARCH('foobarexample', '') AS b,
SELECT
  -- TRUE: '-' and ' ' are delimiters.
  SEARCH('foobar-example', 'foobar example') AS a,

  -- TRUE: The search query is a constant expression evaluated to 'foobar'.
  SEARCH('foobar-example', CONCAT('foo', 'bar')) AS b,

  -- FALSE: The search_query is not split.
  SEARCH('foobar-example', 'foobarexample') AS c,

  -- TRUE: The double backslash escapes the ampersand which is a delimiter.
  SEARCH('foobar-example', 'foobar\\&example') AS d,

  -- TRUE: The single backslash escapes the ampersand in a raw string.
  SEARCH('foobar-example', R'foobar\&example')AS e,

  -- FALSE: The backticks indicate that there must be an exact match for
  -- foobar&example.
  SEARCH('foobar-example', '`foobar&example`') AS f,

  -- TRUE: An exact match is found.
  SEARCH('foobar&example', '`foobar&example`') AS g

/*-------+-------+-------+-------+-------+-------+-------*
 | a     | b     | c     | d     | e     | f     | g     |
 +-------+-------+-------+-------+-------+-------+-------+
 | true  | true  | false | true  | true  | false | true  |
 *-------+-------+-------+-------+-------+-------+-------*/
SELECT
  -- TRUE: The order of terms doesn't matter.
  SEARCH('foobar-example', 'example foobar') AS a,

  -- TRUE: Tokens are made lower-case.
  SEARCH('foobar-example', 'Foobar Example') AS b,

  -- TRUE: An exact match is found.
  SEARCH('foobar-example', '`foobar-example`') AS c,

  -- FALSE: Backticks preserve capitalization.
  SEARCH('foobar-example', '`Foobar`') AS d,

  -- FALSE: Backticks don't have special meaning for search_data and are
  -- not delimiters in the default LOG_ANALYZER.
  SEARCH('`foobar-example`', '`foobar-example`') AS e,

  -- TRUE: An exact match is found after the delimiter in search_data.
  SEARCH('foobar@example.com', '`example.com`') AS f,

  -- TRUE: An exact match is found between the space delimiters.
  SEARCH('a foobar-example b', '`foobar-example`') AS g;

/*-------+-------+-------+-------+-------+-------+-------*
 | a     | b     | c     | d     | e     | f     | g     |
 +-------+-------+-------+-------+-------+-------+-------+
 | true  | true  | true  | false | false | true  | true  |
 *-------+-------+-------+-------+-------+-------+-------*/
SELECT
  -- FALSE: No single array entry matches all search terms.
  SEARCH(['foobar', 'example'], 'foobar example') AS a,

  -- FALSE: The search_query is equivalent to foobar\\=.
  SEARCH('foobar=', '`foobar\\=`') AS b,

  -- FALSE: This is equivalent to the previous example.
  SEARCH('foobar=', R'`\foobar=`') AS c,

  -- TRUE: The equals sign is a delimiter in the data and query.
  SEARCH('foobar=', 'foobar\\=') AS d,

  -- TRUE: This is equivalent to the previous example.
  SEARCH('foobar=', R'foobar\=') AS e,

  -- TRUE: An exact match is found.
  SEARCH('foobar.example', '`foobar`') AS f,

  -- FALSE: `foobar.\` is not analyzed because of backticks; it is not
  -- followed by a delimiter in search_data 'foobar.example'.
  SEARCH('foobar.example', '`foobar.\`') AS g,

  -- TRUE: `foobar.` is not analyzed because of backticks; it is
  -- followed by the delimiter '.' in search_data 'foobar..example'.
  SEARCH('foobar..example', '`foobar.`') AS h;

/*-------+-------+-------+-------+-------+-------+-------+-------*
 | a     | b     | c     | d     | e     | f     | g     | h     |
 +-------+-------+-------+-------+-------+-------+-------+-------+
 | false | false | false | true  | true  | true  | false | true  |
 *-------+-------+-------+-------+-------+-------+-------+-------*/

The following query shows examples of calls to the SEARCH function using the NO_OP_ANALYZER text analyzer and reasons for various return values:

SELECT
  -- TRUE: exact match
  SEARCH('foobar', 'foobar', analyzer=>'NO_OP_ANALYZER') AS a,

  -- FALSE: Backticks are not special characters for `NO_OP_ANALYZER`.
  SEARCH('foobar', '\`foobar\`', analyzer=>'NO_OP_ANALYZER') AS b,

  -- FALSE: The capitalization does not match.
  SEARCH('foobar', 'Foobar', analyzer=>'NO_OP_ANALYZER') AS c,

  -- FALSE: There are no delimiters for `NO_OP_ANALYZER`.
  SEARCH('foobar example', 'foobar', analyzer=>'NO_OP_ANALYZER') AS d,

  -- TRUE: An exact match is found.
  SEARCH('', '', analyzer=>'NO_OP_ANALYZER') AS e;

/*-------+-------+-------+-------+-------*
 | a     | b     | c     | d     | e     |
 +-------+-------+-------+-------+-------+
 | true  | false | false | false | true  |
 *-------+-------+-------+-------+-------*/

Consider the following table called meals with columns breakfast, lunch, and dinner:

/*-------------------+-------------------------+------------------*
 | breakfast         | lunch                   | dinner           |
 +-------------------+-------------------------+------------------+
 | Potato pancakes   | Toasted cheese sandwich | Beef soup        |
 | Avocado toast     | Tomato soup             | Chicken soup     |
 *-------------------+-------------------------+------------------*/

The following query shows how to search single columns, multiple columns, and whole tables, using the default LOG_ANALYZER text analyzer with the default analyzer options:

WITH
  meals AS (
    SELECT
      'Potato pancakes' AS breakfast,
      'Toasted cheese sandwich' AS lunch,
      'Beef soup' AS dinner
    UNION ALL
    SELECT
      'Avocado toast' AS breakfast,
      'Tomato soup' AS lunch,
      'Chicken soup' AS dinner
  )
SELECT
  SEARCH(lunch, 'soup') AS lunch_soup,
  SEARCH((breakfast, dinner), 'soup') AS breakfast_or_dinner_soup,
  SEARCH(meals, 'soup') AS anytime_soup
FROM meals;

/*------------+--------------------------+--------------*
 | lunch_soup | breakfast_or_dinner_soup | anytime_soup |
 +------------+--------------------------+--------------+
 | false      | true                     | true         |
 | true       | true                     | true         |
 *------------+--------------------------+--------------*/

The following query shows additional ways to search, using the default LOG_ANALYZER text analyzer with default analyzer options:

WITH data AS ( SELECT 'Please use foobar@example.com as your email.' AS email )
SELECT
  SEARCH(email, 'exam') AS a,
  SEARCH(email, 'foobar') AS b,
  SEARCH(email, 'example.com') AS c
FROM data;

/*-------+-------+-------*
 | a     | b     | c     |
 +-------+-------+-------+
 | false | true  | true  |
 *-------+-------+-------*/

The following query shows additional ways to search, using the default LOG_ANALYZER text analyzer with custom analyzer options. Terms are only split when a space or @ symbol is encountered.

WITH data AS ( SELECT 'Please use foobar@example.com as your email.' AS email )
SELECT
  SEARCH(email, 'foobar', analyzer_options=>'{"delimiters": [" ", "@"]}') AS a,
  SEARCH(email, 'example', analyzer_options=>'{"delimiters": [" ", "@"]}') AS b,
  SEARCH(email, 'example.com', analyzer_options=>'{"delimiters": [" ", "@"]}') AS c,
  SEARCH(email, 'foobar@example.com', analyzer_options=>'{"delimiters": [" ", "@"]}') AS d
FROM data;

/*-------+-------+-------+-------*
 | a     | b     | c     | d     |
 +-------+-------+-------+-------+
 | true  | false | true  | true  |
 *-------+-------+-------+-------*/

The following query shows how to search, using the NO_OP_ANALYZER text analyzer:

WITH meals AS ( SELECT 'Tomato soup' AS lunch )
SELECT
  SEARCH(lunch, 'Tomato soup', analyzer=>'NO_OP_ANALYZER') AS a,
  SEARCH(lunch, 'soup', analyzer=>'NO_OP_ANALYZER') AS b,
  SEARCH(lunch, 'tomato soup', analyzer=>'NO_OP_ANALYZER') AS c
FROM meals;

/*-------+-------+-------*
 | a     | b     | c     |
 +-------+-------+-------+
 | true  | false | false |
 *-------+-------+-------*/

The following query shows how to use the PATTERN_ANALYZER text analyzer with default analyzer options:

WITH data AS ( SELECT 'Please use foobar@example.com as your email.' AS email )
SELECT
  SEARCH(email, 'exam', analyzer=>'PATTERN_ANALYZER') AS a,
  SEARCH(email, 'foobar', analyzer=>'PATTERN_ANALYZER') AS b,
  SEARCH(email, 'example.com', analyzer=>'PATTERN_ANALYZER') AS c
FROM data;

/*-------+-------+-------*
 | a     | b     | c     |
 +-------+-------+-------+
 | false | true  | true  |
 *-------+-------+-------*/

The following query shows additional ways to search, using the PATTERN_ANALYZER text analyzer with custom analyzer options:

WITH data AS ( SELECT 'Please use foobar@EXAMPLE.com as your email.' AS email )
SELECT
  SEARCH(email, 'EXAMPLE', analyzer=>'PATTERN_ANALYZER', analyzer_options=>'{"patterns": ["[A-Z]*"]}') AS a,
  SEARCH(email, 'example', analyzer=>'PATTERN_ANALYZER', analyzer_options=>'{"patterns": ["[a-z]*"]}') AS b,
  SEARCH(email, 'example.com', analyzer=>'PATTERN_ANALYZER', analyzer_options=>'{"patterns": ["[a-z]*"]}') AS c,
  SEARCH(email, 'example.com', analyzer=>'PATTERN_ANALYZER', analyzer_options=>'{"patterns": ["[a-zA-Z.]*"]}') AS d
FROM data;

/*-------+-------+-------+-------*
 | a     | b     | c     | d     |
 +-------+-------+-------+-------+
 | true  | false | false | true  |
 *-------+-------+-------+-------*/

For additional examples that include analyzer options, see the Text analysis reference guide.

For helpful analyzer recipes that you can use to enhance analyzer-supported queries, see the Search with text analyzers user guide.

To provide feedback or request support for this feature, send email to bq-vector-search@google.com.

VECTOR_SEARCH(
  {TABLE base_table | base_table_query_statement},
  column_to_search,
  TABLE query_table
  [, query_column_to_search => query_column_to_search_value]
  [, top_k => top_k_value ]
  [, distance_type => distance_type_value ]
  [, options => options_value ]
)
VECTOR_SEARCH(
  {TABLE base_table | base_table_query_statement},
  column_to_search,
  (query_statement)
  [, query_column_to_search => query_column_to_search_value]
  [, top_k => top_k_value ]
  [, distance_type => distance_type_value ]
  [, options => options_value ]
)

Description

The VECTOR_SEARCH function lets you search embeddings to find semantically similar entities.

Embeddings are high-dimensional numerical vectors that represent a given entity, like a piece of text or an audio file. Machine learning (ML) models use embeddings to encode semantics about such entities to make it easier to reason about and compare them. For example, a common operation in clustering, classification, and recommendation models is to measure the distance between vectors in an embedding space to find items that are most semantically similar.

Definitions

Details

You can optionally use VECTOR_SEARCH with a vector index. When a vector index is used, VECTOR_SEARCH uses the Approximate Nearest Neighbor search technique to help improve vector search performance, with the trade-off of reducing recall and so returning more approximate results. Brute force is used to return exact results when a vector index isn't available, and you can choose to use brute force to get exact results even when a vector index is available.

Output

For each row in the query data, the output contains multiple rows from the base table that satisfy the search criteria. The number of results rows per query table row is either 10 or the top_k value if it is specified. The order of the output isn't guaranteed.

The output includes the following columns:

Limitations

BigQuery data security and governance rules apply to the use of VECTOR_SEARCH, which results in the following behavior:

Examples

The following queries create test tables table1 and table2 to use in subsequent query examples :

CREATE OR REPLACE TABLE mydataset.table1
(
  id INT64,
  my_embedding ARRAY<FLOAT64>
);

INSERT mydataset.table1 (id, my_embedding)
VALUES(1, [1.0, 2.0]),
(2, [2.0, 4.0]),
(3, [1.5, 7.0]),
(4, [1.0, 3.2]),
(5, [5.0, 5.4]),
(6, [3.7, 1.8]),
(7, [4.4, 2.9]);
CREATE OR REPLACE TABLE mydataset.table2
(
  query_id STRING,
  embedding ARRAY<FLOAT64>
);

INSERT mydataset.table2 (query_id, embedding)
VALUES('dog', [1.0, 2.0]),
('cat', [3.0, 5.2]);

The following example searches the my_embedding column of table1 for the top two embeddings that match each row of data in the embedding column of table2:

SELECT *
FROM
  VECTOR_SEARCH(
    TABLE mydataset.table1,
    'my_embedding',
    (SELECT query_id, embedding FROM mydataset.table2),
    'embedding',
    top_k => 2);

/*----------------+-----------------+---------+----------------------------------------*
 | query.query_id | query.embedding | base.id | base.my_embedding | distance           |
 +----------------+-----------------+---------+-------------------+--------------------+
 | dog            | 1.0             | 1       | 1.0               | 0                  |
 |                | 2.0             |         | 2.0               |                    |
 +----------------+-----------------+---------+-------------------+--------------------+
 | dog            | 1.0             | 4       | 1.0               | 1.2000000000000002 |
 |                | 2.0             |         | 3.2               |                    |
 +----------------+-----------------+---------+-------------------+--------------------+
 | cat            | 3.0             | 2       | 2.0               | 1.5620499351813311 |
 |                | 5.2             |         | 4.0               |                    |
 +----------------+-----------------+---------+-------------------+--------------------+
 | cat            | 3.0             | 5       | 5.0               | 2.0099751242241779 |
 |                | 5.2             |         | 5.4               |                    |
 *----------------+-----------------+---------+-------------------+--------------------*/

The following example pre-filters table1 to rows where id is not equal to 4 and then searches the my_embedding column of table1 for the top two embeddings that match each row of data in the embedding column of table2:

SELECT *
FROM
  VECTOR_SEARCH(
    (SELECT * FROM mydataset.table1 WHERE id != 4),
    'my_embedding',
    (SELECT query_id, embedding FROM mydataset.table2),
    'embedding',
    top_k => 2,
    options => '{"use_brute_force":true}');

/*----------------+-----------------+---------+----------------------------------------*
 | query.query_id | query.embedding | base.id | base.my_embedding | distance           |
 +----------------+-----------------+---------+-------------------+--------------------+
 | dog            | 1.0             | 1       | 1.0               | 0                  |
 |                | 2.0             |         | 2.0               |                    |
 +----------------+-----------------+---------+-------------------+--------------------+
 | dog            | 1.0             | 2       | 2.0               | 2.23606797749979   |
 |                | 2.0             |         | 4.0               |                    |
 +----------------+-----------------+---------+-------------------+--------------------+
 | cat            | 3.0             | 2       | 2.0               | 1.5620499351813311 |
 |                | 5.2             |         | 4.0               |                    |
 +----------------+-----------------+---------+-------------------+--------------------+
 | cat            | 3.0             | 5       | 5.0               | 2.0099751242241779 |
 |                | 5.2             |         | 5.4               |                    |
 *----------------+-----------------+---------+-------------------+--------------------*/

The following example searches the my_embedding column of table1 for the top two embeddings that match each row of data in the embedding column of table2, and uses the COSINE distance type to measure the distance between the embeddings:

SELECT *
FROM
  VECTOR_SEARCH(
    TABLE mydataset.table1,
    'my_embedding',
    TABLE mydataset.table2,
    'embedding',
    top_k => 2,
    distance_type => 'COSINE');

/*----------------+-----------------+---------+-------------------------------------------+
 | query.query_id | query.embedding | base.id | base.my_embedding | distance              |
 +----------------+-----------------+---------+-------------------+-----------------------+
 | dog            | 1.0             | 2       | 2.0               | 0                     |
 |                | 2.0             |         | 4.0               |                       |
 +----------------+-----------------+---------+-------------------+-----------------------+
 | dog            | 1.0             | 1       | 1.0               | 0                     |
 |                | 2.0             |         | 2.0               |                       |
 +----------------+-----------------+---------+-------------------+-----------------------+
 | cat            | 3.0             | 2       | 2.0               | 0.0017773842088002478 |
 |                | 5.2             |         | 4.0               |                       |
 +----------------+-----------------+---------+-------------------+-----------------------+
 | cat            | 3.0             | 1       | 1.0               | 0.0017773842088002478 |
 |                | 5.2             |         | 2.0               |                       |
 *----------------+-----------------+---------+-------------------+-----------------------*/