Shredding into SUPER columns with materialized views - Amazon Redshift

Shredding into SUPER columns with materialized views

The following example shows a materialized view that shreds the nested data with the resulting columns still being the SUPER data type.

SELECT c.c_name, o.o_orderstatus FROM customer_orders_lineitem c, c.c_orders o;

The following example shows a materialized view that creates conventional Amazon Redshift scalar columns from the shredded data.

SELECT c.c_name, c.c_orders[0].o_totalprice FROM customer_orders_lineitem c;

You can create a single materialized view super_mv to accelerate both queries.

To answer the first query, you must materialize the attribute o_orderstatus. You can omit the attribute c_name because it doesn't involve nested navigation nor unnesting. You must also include in the materialized view the attribute c_custkey of customer_orders_lineitem to be able to join the base table with the materialized view.

To answer the second query, you must also materialize the attribute o_totalprice and the array index o_idx of c_orders. Hence, you can access the index 0 of c_orders.

CREATE MATERIALIZED VIEW super_mv distkey(c_custkey) sortkey(c_custkey) AS ( SELECT c_custkey, o.o_orderstatus, o.o_totalprice, o_idx FROM customer_orders_lineitem c, c.c_orders o AT o_idx );

The attributes o_orderstatus and o_totalprice of the materialized view super_mv are SUPER.

The materialized view super_mv will be refreshed incrementally upon changes to the base table customer_orders_lineitem.

REFRESH MATERIALIZED VIEW super_mv; INFO: Materialized view super_mv was incrementally updated successfully.

To rewrite the first PartiQL query as a regular SQL query, join customer_orders_lineitem with super_mv as follows.

SELECT c.c_name, v.o_orderstatus FROM customer_orders_lineitem c JOIN super_mv v ON c.c_custkey = v.c_custkey;

Similarly, you can rewrite the second PartiQL query. The following example uses a filter on o_idx = 0.

SELECT c.c_name, v.o_totalprice FROM customer_orders_lineitem c JOIN super_mv v ON c.c_custkey = v.c_custkey WHERE v.o_idx = 0;

In the CREATE MATERIALIZED VIEW command, specify c_custkey as distribution key and sort key for super_mv. Amazon Redshift performs an efficient merge join, assuming that c_custkey is also the distribution key and sort key of customer_orders_lineitem. If that isn’t the case, you can specify c_custkey as the sort key and distribution key of customer_orders_lineitem as follows.

ALTER TABLE customer_orders_lineitem ALTER DISTKEY c_custkey, ALTER SORTKEY (c_custkey);

Use the EXPLAIN statement to verify that Amazon Redshift performs a merge join on the rewritten queries.

EXPLAIN SELECT c.c_name, v.o_orderstatus FROM customer_orders_lineitem c JOIN super_mv v ON c.c_custkey = v.c_custkey; QUERY PLAN ------------------------------------------------------------------------------------------------------ XN Merge Join DS_DIST_NONE (cost=0.00..34701.82 rows=1470776 width=27) Merge Cond: ("outer".c_custkey = "inner".c_custkey) -> XN Seq Scan on mv_tbl__super_mv__0 derived_table2 (cost=0.00..14999.86 rows=1499986 width=13) -> XN Seq Scan on customer_orders_lineitem c (cost=0.00..999.96 rows=99996 width=30) (4 rows)