The ML.ENTITY_FEATURES_AT_TIME function

This document describes the ML.ENTITY_FEATURES_AT_TIME function, which lets you use multiple point-in-time cutoffs for multiple entities when retrieving features, because features can have time dependencies if they include time-sensitive data. To avoid data leakage, use point-in-time features when training models and running inference.

Use this function to retrieve features from multiple entities for multiple points in time. For example, you could retrieve features created at or before three different points in time for entity 1, and features created at or before yet another different point in time for entity 2. Use the ML.FEATURES_AT_TIME function to use the same point-in-time cutoff for all entities when retrieving features.

Syntax

ML.ENTITY_FEATURES_AT_TIME(
  { TABLE feature_table | (feature_query_statement) },
  { TABLE entity_time_table | (entity_time_query_statement) }
  [, num_rows => INT64][, ignore_feature_nulls => BOOL])

Arguments

ML.ENTITY_FEATURES_AT_TIME takes the following arguments:

Output

ML.ENTITY_FEATURES_AT_TIME returns the input table rows that meet the point-in-time cutoff criteria, with the feature_timestamp column showing the timestamp from the time column of the entity time table.

Because you can specify multiple points in time from which to retrieve features for the same entity, it is possible to return duplicate rows, depending on the timestamps in the feature and entity time tables, and the num_rows value you specify. For example, if the only row in the feature table for entity ID 1 is timestamped 2022-06-11 10:00:00+00, and you have two rows for entity ID 1 in the entity time table that both have later timestamps, the function output has 2 rows with the same feature data for entity ID 1.

If either of the following conditions are true:

Then the function doesn't return any output for that entity time table row.

Examples

Example 1

This example shows a how to retrain a model using only features that were created or updated before the timestamps identified in mydataset.entity_time_table:

CREATE OR REPLACE
  `mydataset.mymodel` OPTIONS (WARM_START = TRUE)
AS
SELECT * EXCEPT (feature_timestamp, entity_id)
FROM
  ML.ENTITY_FEATURES_AT_TIME(
    TABLE `mydataset.feature_table`,
    TABLE `mydataset.entity_time_table`,
    num_rows => 1,
    ignore_feature_nulls => TRUE);

Example 2

This example shows a how to get predictions from a model based on features that were created or updated before the timestamps identified in mydataset.entity_time_table:

SELECT
  *
FROM
  ML.PREDICT(
    MODEL `mydataset.mymodel`,
    (
      SELECT * EXCEPT (feature_timestamp, entity_id)
      FROM
        ML.ENTITY_FEATURES_AT_TIME(
          TABLE `mydataset.feature_table`,
          TABLE `mydataset.entity_time_table`,
          num_rows => 1,
          ignore_feature_nulls => TRUE)
    )
  );

Example 3

This is a contrived example that you can use to see the output of the function:

WITH
  feature_table AS (
    SELECT * FROM UNNEST(
      ARRAY<STRUCT<entity_id STRING, f_1 FLOAT64, f_2 FLOAT64, feature_timestamp TIMESTAMP>>[
        ('id1', 1.0, 1.0, TIMESTAMP '2022-06-10 12:00:00+00'),
        ('id2', 12.0, 24.0, TIMESTAMP '2022-06-11 12:00:00+00'),
        ('id1', 11.0, NULL, TIMESTAMP '2022-06-11 12:00:00+00'),
        ('id1', 6.0, 12.0, TIMESTAMP '2022-06-11 10:00:00+00'),
        ('id2', 2.0, 4.0, TIMESTAMP '2022-06-10 12:00:00+00'),
        ('id2', 7.0, NULL, TIMESTAMP '2022-06-11 10:00:00+00')])
  ),
  entity_time_table AS (
    SELECT * FROM UNNEST(
      ARRAY<STRUCT<entity_id STRING, time TIMESTAMP>>[
        ('id1', TIMESTAMP '2022-06-12 12:00:00+00'),
        ('id2', TIMESTAMP '2022-06-11 10:00:00+00'),
        ('id1', TIMESTAMP '2022-06-10 13:00:00+00')])
  )
SELECT *
FROM
  ML.ENTITY_FEATURES_AT_TIME(
    TABLE feature_table, TABLE entity_time_table, num_rows => 1, ignore_feature_nulls => TRUE);