CleanRoomsML / Client / create_training_dataset
create_training_dataset#
- CleanRoomsML.Client.create_training_dataset(**kwargs)#
Defines the information necessary to create a training dataset. In Clean Rooms ML, the
TrainingDataset
is metadata that points to a Glue table, which is read only duringAudienceModel
creation.See also: AWS API Documentation
Request Syntax
response = client.create_training_dataset( name='string', roleArn='string', trainingData=[ { 'type': 'INTERACTIONS', 'inputConfig': { 'schema': [ { 'columnName': 'string', 'columnTypes': [ 'USER_ID'|'ITEM_ID'|'TIMESTAMP'|'CATEGORICAL_FEATURE'|'NUMERICAL_FEATURE', ] }, ], 'dataSource': { 'glueDataSource': { 'tableName': 'string', 'databaseName': 'string', 'catalogId': 'string' } } } }, ], tags={ 'string': 'string' }, description='string' )
- Parameters:
name (string) –
[REQUIRED]
The name of the training dataset. This name must be unique in your account and region.
roleArn (string) –
[REQUIRED]
The ARN of the IAM role that Clean Rooms ML can assume to read the data referred to in the
dataSource
field of each dataset.Passing a role across AWS accounts is not allowed. If you pass a role that isn’t in your account, you get an
AccessDeniedException
error.trainingData (list) –
[REQUIRED]
An array of information that lists the Dataset objects, which specifies the dataset type and details on its location and schema. You must provide a role that has read access to these tables.
(dict) –
Defines where the training dataset is located, what type of data it contains, and how to access the data.
type (string) – [REQUIRED]
What type of information is found in the dataset.
inputConfig (dict) – [REQUIRED]
A DatasetInputConfig object that defines the data source and schema mapping.
schema (list) – [REQUIRED]
The schema information for the training data.
(dict) –
Metadata for a column.
columnName (string) – [REQUIRED]
The name of a column.
columnTypes (list) – [REQUIRED]
The data type of column.
(string) –
dataSource (dict) – [REQUIRED]
A DataSource object that specifies the Glue data source for the training data.
glueDataSource (dict) – [REQUIRED]
A GlueDataSource object that defines the catalog ID, database name, and table name for the training data.
tableName (string) – [REQUIRED]
The Glue table that contains the training data.
databaseName (string) – [REQUIRED]
The Glue database that contains the training data.
catalogId (string) –
The Glue catalog that contains the training data.
tags (dict) –
The optional metadata that you apply to the resource to help you categorize and organize them. Each tag consists of a key and an optional value, both of which you define.
The following basic restrictions apply to tags:
Maximum number of tags per resource - 50.
For each resource, each tag key must be unique, and each tag key can have only one value.
Maximum key length - 128 Unicode characters in UTF-8.
Maximum value length - 256 Unicode characters in UTF-8.
If your tagging schema is used across multiple services and resources, remember that other services may have restrictions on allowed characters. Generally allowed characters are: letters, numbers, and spaces representable in UTF-8, and the following characters: + - = . _ : / @.
Tag keys and values are case sensitive.
Do not use aws:, AWS:, or any upper or lowercase combination of such as a prefix for keys as it is reserved for AWS use. You cannot edit or delete tag keys with this prefix. Values can have this prefix. If a tag value has aws as its prefix but the key does not, then Clean Rooms ML considers it to be a user tag and will count against the limit of 50 tags. Tags with only the key prefix of aws do not count against your tags per resource limit.
(string) –
(string) –
description (string) – The description of the training dataset.
- Return type:
dict
- Returns:
Response Syntax
{ 'trainingDatasetArn': 'string' }
Response Structure
(dict) –
trainingDatasetArn (string) –
The Amazon Resource Name (ARN) of the training dataset resource.
Exceptions