What is a datashare?
A datashare is the unit of sharing data in Amazon Redshift. Use datashares to share data in the same AWS account or different AWS accounts. Also, share data for read purposes across different Amazon Redshift clusters.
Each datashare is associated with a specific database in your Amazon Redshift cluster.
A producer cluster administrator can create datashares and add datashare objects to share data with other clusters, referred to as outbound shares. A consumer cluster administrator can receive datashares from other clusters, referred to as inbound shares. For details on producers and consumers, see Datashare producers and consumers.
Datashare objects are objects from specific databases on a cluster that producer cluster administrators can add to datashares to be shared with data consumers. Datashare objects are read-only for data consumers. Examples of datashare objects are tables, views, and user-defined functions. You can add datashare objects to datashares while creating datashares or editing a datashare at any time.
Data sharing continues to work when clusters are resized or when the producer cluster is paused.
There are different types of datashares.
Topics
Standard datashares
With standard datashares, you can share data across provisioned clusters, serverless workgroups, Availability Zones, AWS accounts, and AWS Regions. You can share between cluster types as well as between provisioned clusters and Amazon Redshift Serverless.
To share data, note the following provisioned cluster, serverless namespace, and AWS account identifiers:
Provisioned cluster namespaces are identifiers that identify Amazon Redshift provisioned clusters. A namespace globally unique identifier (GUID) is automatically created during provisioned cluster creation and attached to the cluster. A namespace Amazon Resource Name (ARN) is in the arn:{partition}:redshift:{region}:{account-id}:namespace:{namespace-guid} format. You can see the namespace of a provisioned cluster on the cluster details page on the Amazon Redshift console.
In the data sharing workflow, the namespace GUID value and the cluster namespace ARN are used to share data with clusters in the AWS account. You can also find the namespace for the current cluster by using the
current_namespace
function.Serverless namespaces are identifiers that identify Amazon Redshift Serverless. A namespace globally unique identifier (GUID) is automatically created during Amazon Redshift Serverless creation and attached to the instance. A serverless namespace ARN is in the arn:{partition}:redshift-serverless:{region}:{account-id}:namespace/{namespace-guid} format.
AWS accounts can be consumers for datashares and are each represented by a 12-digit AWS account ID.
For standard datashares, consider the following:
When a producer cluster is deleted, Amazon Redshift deletes the datashares created by the producer cluster. When a producer cluster is backed up and restored, the created datashares still persist on the restored cluster. However, datashare permissions granted to other clusters are no longer valid on the restored cluster. Re-grant usage permissions of datashares to desired consumer clusters. The consumer database on the consumer cluster points to the datashare from the original cluster where the snapshot is taken. To query the shared data from the restored cluster, the consumer cluster administrator creates a different database. Or the administrator can drop and recreate an existing consumer database to use the datashare from the newly restored cluster.
When a consumer cluster is deleted and restored from a snapshot, the previous access shared to this cluster would no longer be valid and visible. If access to datashares is still required on the restored consumer cluster, the producer cluster administrator must grant usage of datashares to the restored consumer cluster again. The consumer cluster administrator must drop any stale consumer databases created from the inactive datashares. Then the administrator must recreate the consumer database from the datashare, after the producer re-granted the permissions. As the cluster namespace GUID is different on a restored cluster from the original cluster, re-grant datashare permissions when the consumer or producer cluster is restored from backup.
AWS Data Exchange datashares
An AWS Data Exchange datashare is a unit of licensing for sharing your data through AWS Data Exchange. AWS manages all billing and payments associated with subscriptions to AWS Data Exchange and use of Amazon Redshift data sharing. Approved data providers can add AWS Data Exchange datashares to AWS Data Exchange products. When customers subscribe to a product with AWS Data Exchange datashares, they get access to the datashares in the product.
AWS Data Exchange for Amazon Redshift makes it convenient to license access to your Amazon Redshift data through AWS Data Exchange. When a customer subscribes to a product with AWS Data Exchange datashares, AWS Data Exchange automatically adds the customer as a data consumer on all AWS Data Exchange datashares included with the product. Invoices are automatically generated, and payments are centrally collected and automatically disbursed through AWS Marketplace Entitlement Service.
Providers can license data in Amazon Redshift at a granular level, such as schemas, tables, views, and user-defined functions. You can use the same AWS Data Exchange datashare across multiple AWS Data Exchange products. Any objects added to the AWS Data Exchange datashare is available to consumers. Producers can view all AWS Data Exchange datashares managed by AWS Data Exchange on their behalf using Amazon Redshift API operations, SQL commands, and the Amazon Redshift console. Customers who subscribe to a product AWS Data Exchange datashares have read-only access to the objects in the datashares.
Customers who want to consume third-party producer data can browse the AWS Data Exchange catalog to discover and subscribe to datasets in Amazon Redshift. After their AWS Data Exchange subscription is active, they can create a database from the datashare in their cluster and query the data in Amazon Redshift.
How AWS Data Exchange datashares work
Managing AWS Data Exchange datashares as a producer administrator
If you are a data producer (also known as a provider on AWS Data Exchange), you can create AWS Data Exchange datashares that connect to your Amazon Redshift databases. To add AWS Data Exchange datashares to products on AWS Data Exchange, you must be a registered AWS Data Exchange provider.
For more information on how to get started with AWS Data Exchange datashares, see Sharing licensed Amazon Redshift data on AWS Data Exchange.
Using AWS Data Exchange datashares as a consumer with an active AWS Data Exchange subscription
If you are a consumer with an active AWS Data Exchange subscription (also known as a subscriber on AWS Data Exchange), you can browse the AWS Data Exchange catalog on the AWS Data Exchange console to discover products containing AWS Data Exchange datashares.
After you subscribe to a product that contains AWS Data Exchange datashares, create a database from the datashare within your cluster. You can then query the data in Amazon Redshift directly without extracting, transforming, and loading the data.
For more information on how to get started with AWS Data Exchange datashares, see Sharing licensed Amazon Redshift data on AWS Data Exchange.
For AWS Data Exchange datashares, consider the following:
When a producer cluster is deleted, Amazon Redshift deletes the datashares created by the producer cluster. When a producer cluster is backed up and restored, the created datashares still persist on the restored cluster. For data subscribers to be able to continue accessing the data, create the AWS Data Exchange datashares again and publish them to the product's data sets. The consumer database on the consumer cluster points to the datashare from the original cluster where the snapshot is taken. To query the shared data from the restored cluster, the consumer cluster administrator creates a different database, or drops and recreates an existing consumer database to use the newly created AWS Data Exchange datashare from the newly restored cluster.
When a consumer cluster is deleted and restored from a snapshot, the previous access shared to this cluster remains valid and visible. Consumer cluster administrator must drop any stale consumer databases created from the inactive datashares and recreate the consumer database from the datashare after the producer re-grants the permissions. As the cluster namespace GUID is different on a restored cluster from the original cluster, re-grant datashare permissions when the producer cluster is restored from backup.
We recommend that you don't delete your cluster if you have any AWS Data Exchange datashares. Performing this type of alteration can breach data product terms in AWS Data Exchange.
Considerations when using AWS Data Exchange for Amazon Redshift
When using AWS Data Exchange for Amazon Redshift, consider the following:
Both producers and consumers must use the RA3 instance types to use Amazon Redshift datashares. Producers must use the RA3 instance types with the latest Amazon Redshift cluster version.
Both the producer and consumer clusters must be encrypted.
You must be registered as an AWS Data Exchange provider to list products on AWS Data Exchange, including products that contain AWS Data Exchange datashares. For more information, see Getting started as a provider.
You don't need to be a registered AWS Data Exchange provider to find, subscribe to, and query Amazon Redshift data through AWS Data Exchange.
To control access to your data, create AWS Data Exchange datashares with the publicly accessible setting turned on. To alter an AWS Data Exchange datashare to turn off the publicly accessible setting, set the session variable to allow ALTER DATASHARE SET PUBLICACCESSIBLE FALSE. For more information, see ALTER DATASHARE usage notes.
Producers can't manually add or remove consumers from AWS Data Exchange datashares because access to the datashares is granted based on having an active subscription to an AWS Data Exchange product that contains the AWS Data Exchange datashare.
Producers can't view the SQL queries that consumers run. They can only view metadata, such as the number of queries or the objects consumers query, through Amazon Redshift tables that only the producer can access. For more information, see Monitoring and auditing data sharing in Amazon Redshift.
We recommend that you make your datashares publicly accessible. If you don't, subscribers on AWS Data Exchange with publicly accessible consumer clusters won't be able to use your datashare.
We recommend that you don't delete an AWS Data Exchange datashare shared to other AWS accounts using the DROP DATASHARE statement. If you do, the AWS accounts that have access to the datashare will lose access. This action is irreversible. Performing this type of alteration can breach data product terms in AWS Data Exchange. If you want to delete an AWS Data Exchange datashare, see DROP DATASHARE usage notes.
For cross-Region data sharing, you can create AWS Data Exchange datashares to share licensed data.
When consuming data from a different Region, the consumer pays the Cross-Region data transfer fee from the producer Region to the consumer Region.
AWS Lake Formation-managed datashares
Using AWS Lake Formation, you can centrally define and enforce database, table, column, and row-level access permissions of Amazon Redshift datashares and restrict user access to objects within a datashare. By sharing data through Lake Formation, you can define permissions in Lake Formation and apply those permissions to any datashare and its objects. For example, if you have a table containing employee information, you can use Lake Formation's column-level filters to prevent employees who don't work in the HR department from seeing personally identifiable information (PII), such as a social security number. For more information about data filters, see Data filtering and cell-level security in Lake Formation in the AWS Lake Formation Developer Guide.
You can also use tags in Lake Formation to configure permissions on Lake Formation resources. For more information, see Lake Formation Tag-based access control.
Amazon Redshift currently supports data sharing via Lake Formation when sharing within the same account or across accounts. Cross-Region sharing is currently not supported.
The following is a high-level overview of how to use Lake Formation to control datashare permissions:
In Amazon Redshift, the producer cluster or workgroup administrator creates a datashare on the producer cluster or workgroup and grants usage to a Lake Formation account.
The producer cluster or workgroup administrator authorizes the Lake Formation account to access the datashare.
The Lake Formation administrator discovers and registers the datashares. They must also discover the AWS Glue ARNs they have access to and associate the datashares with an AWS Glue Data Catalog ARN. If you're using the AWS CLI you can discover and accept datashares with the Redshift CLI operations
describe-data-shares
andassociate-data-share-consumer
. To register a datashare, use the Lake Formation CLI operationregister-resource
.The Lake Formation administrator creates a federated database in the AWS Glue Data Catalog, and configures Lake Formation permissions to control user access to objects within the datashare. For more information about federated databases in AWS Glue, see Managing permissions for data in an Amazon Redshift datashare.
The Lake Formation administrator discovers the AWS Glue databases they have access to and associates the datashare with an AWS Glue Data Catalog ARN.
The Redshift administrator discovers the AWS Glue database ARNs they have access to, creates an external database in the Amazon Redshift consumer cluster using a AWS Glue database ARN, and grants usage to database users authenticated with IAM credentials to start querying the Amazon Redshift database.
Database users can use the views SVV_EXTERNAL_TABLES and SVV_EXTERNAL_COLUMNS to find all of the tables or columns within the AWS Glue database that they have access to, and then they can query the AWS Glue database’s tables.
When the producer cluster or workgroup administrator decides to no longer share the data with the consumer cluster, the producer cluster administrator can revoke usage, deauthorize, or delete the datashare from Redshift. The associated permissions and objects in Lake Formation are not automatically deleted.
For more information about sharing a datashare with AWS Lake Formation as a producer cluster or workgroup administrator, see Working with Lake Formation-managed datashares as a producer. To consume the shared data from the producer cluster or workgroup, see Working with Lake Formation-managed datashares as a consumer.
Considerations and limitations when using AWS Lake Formation with Amazon Redshift
The following are considerations and limitations for sharing Amazon Redshift data via Lake Formation. For information on data sharing considerations and limitations, see Considerations when using data sharing in Amazon Redshift. For information about Lake Formation limitations, see Notes on working with Amazon Redshift datashares in Lake Formation.
-
Sharing a datashare to Lake Formation across Regions is currently unsupported.
-
If column-level filters are defined for a user on a shared relation, performing a
SELECT *
operation returns only the columns the user has access to. -
Cell-level filters from Lake Formation are unsupported.
-
If you created and shared a view and its tables to Lake Formation, you can configure filters to manage access of the tables, Amazon Redshift enforces Lake Formation defined policies when consumer cluster users access shared objects. When a user accesses a view shared with Lake Formation, Redshift enforces only the Lake Formation policies defined on the view and not the tables contained within the view. However, when users directly access the table, Redshift enforces the defined Lake Formation policies on the table.
-
You can't create materialized views on the consumer based on a shared table if the table has Lake Formation filters configured.
-
The Lake Formation administrator must have data lake administrator permissions and the required permissions to accept a datashare.
-
The producer consumer cluster must be an RA3 cluster with the latest Amazon Redshift cluster version or a serverless workgroup to share datashares via Lake Formation.
-
Both the producer and consumer clusters must be encrypted.
-
Redshift row-level and column-level access control policies implemented in the producer cluster or workgroup are ignored when the datashare is shared to Lake Formation. The Lake Formation administrator must configure these policies in Lake Formation. The producer cluster or workgroup administrator can turn off RLS for a table by using the ALTER TABLE command.
-
Sharing datashares via Lake Formation is only available to users who have access to both Redshift and Lake Formation.
Datashare producers and consumers
Data producers (also known as data sharing producers or datashare producers) are clusters that you want to share data from. Producer cluster administrators and database owners can create datashares using the CREATE DATASHARE command. You can add objects such as schemas, tables, views, and SQL user-defined functions (UDFs) from a database that you want the producer cluster to share with consumer clusters for read purposes.
Data producers (also known as providers on AWS Data Exchange) for AWS Data Exchange datashares can license data through AWS Data Exchange. Approved providers can add AWS Data Exchange datashares to AWS Data Exchange products.
When a customer subscribes to a product with AWS Data Exchange datashares, AWS Data Exchange automatically adds the customer as a data consumer on all AWS Data Exchange datashares included with the product. AWS Data Exchange also removes all customers from AWS Data Exchange datashares when their subscription ends. AWS Data Exchange also automatically manages billing, invoicing, payment collection, and payment distribution for paid products with AWS Data Exchange datashares. For more information, see AWS Data Exchange datashares. To register as an AWS Data Exchange data provider, see Getting started as a provider.
Data consumers (also known as data sharing consumers or datashare consumers) are clusters that receive datashares from producer clusters.
Amazon Redshift clusters that share data can be in the same or different AWS accounts or different AWS Regions, so you can share data across organizations and collaborate with other parties. Consumer cluster administrators receive the datashares that they are granted usage for and review the contents of each datashare. To consume shared data, the consumer cluster administrator creates an Amazon Redshift database from the datashare. The administrator then assigns permissions for the database to users and roles in the consumer cluster. After permissions are granted, users and roles can list the shared objects as part of the standard metadata queries, along with the local data on the consumer cluster. They can start querying immediately.
If you are a consumer with an active AWS Data Exchange subscription (also known as subscribers on AWS Data Exchange), you can find, subscribe to, and query granular, up-to-date data in Amazon Redshift without the need to extract, transform, and load the data. For more information, see AWS Data Exchange datashares.