EntityResolution / Client / create_matching_workflow
create_matching_workflow¶
- EntityResolution.Client.create_matching_workflow(**kwargs)¶
Creates a matching workflow that defines the configuration for a data processing job. The workflow name must be unique. To modify an existing workflow, use
UpdateMatchingWorkflow.Warning
For workflows where
resolutionTypeisML_MATCHINGorPROVIDER, incremental processing is not supported.See also: AWS API Documentation
Request Syntax
response = client.create_matching_workflow( workflowName='string', description='string', inputSourceConfig=[ { 'inputSourceARN': 'string', 'schemaName': 'string', 'applyNormalization': True|False }, ], outputSourceConfig=[ { 'KMSArn': 'string', 'outputS3Path': 'string', 'output': [ { 'name': 'string', 'hashed': True|False }, ], 'applyNormalization': True|False, 'customerProfilesIntegrationConfig': { 'domainArn': 'string', 'objectTypeArn': 'string' } }, ], resolutionTechniques={ 'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER', 'ruleBasedProperties': { 'rules': [ { 'ruleName': 'string', 'matchingKeys': [ 'string', ] }, ], 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'matchPurpose': 'IDENTIFIER_GENERATION'|'INDEXING' }, 'ruleConditionProperties': { 'rules': [ { 'ruleName': 'string', 'condition': 'string' }, ] }, 'providerProperties': { 'providerServiceArn': 'string', 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'intermediateSourceConfiguration': { 'intermediateS3Path': 'string' } } }, incrementalRunConfig={ 'incrementalRunType': 'IMMEDIATE' }, roleArn='string', tags={ 'string': 'string' } )
- Parameters:
workflowName (string) –
[REQUIRED]
The name of the workflow. There can’t be multiple
MatchingWorkflowswith the same name.description (string) – A description of the workflow.
inputSourceConfig (list) –
[REQUIRED]
A list of
InputSourceobjects, which have the fieldsInputSourceARNandSchemaName.(dict) –
An object containing
inputSourceARN,schemaName, andapplyNormalization.inputSourceARN (string) – [REQUIRED]
An Glue table Amazon Resource Name (ARN) for the input source table.
schemaName (string) – [REQUIRED]
The name of the schema to be retrieved.
applyNormalization (boolean) –
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an
AttributeTypeofPHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
outputSourceConfig (list) –
[REQUIRED]
A list of
OutputSourceobjects, each of which contains fieldsoutputS3Path,applyNormalization,KMSArn, andoutput.(dict) –
A list of
OutputAttributeobjects, each of which have the fieldsNameandHashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.KMSArn (string) –
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
outputS3Path (string) –
The S3 path to which Entity Resolution will write the output table.
output (list) – [REQUIRED]
A list of
OutputAttributeobjects, each of which have the fieldsNameandHashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.(dict) –
A list of
OutputAttributeobjects, each of which have the fieldsNameandHashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.name (string) – [REQUIRED]
A name of a column to be written to the output. This must be an
InputFieldname in the schema mapping.hashed (boolean) –
Enables the ability to hash the column values in the output.
applyNormalization (boolean) –
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an
AttributeTypeofPHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.customerProfilesIntegrationConfig (dict) –
Specifies the Customer Profiles integration configuration for sending matched output directly to Customer Profiles. When configured, Entity Resolution automatically creates and updates customer profiles based on match clusters, eliminating the need for manual Amazon S3 integration setup.
domainArn (string) – [REQUIRED]
The Amazon Resource Name (ARN) of the Customer Profiles domain where the matched output will be sent.
objectTypeArn (string) – [REQUIRED]
The Amazon Resource Name (ARN) of the Customer Profiles object type that defines the structure for the matched customer data.
resolutionTechniques (dict) –
[REQUIRED]
An object which defines the
resolutionTypeand theruleBasedProperties.resolutionType (string) – [REQUIRED]
The type of matching workflow to create. Specify one of the following types:
RULE_MATCHING: Match records using configurable rule-based criteriaML_MATCHING: Match records using machine learning modelsPROVIDER: Match records using a third-party matching provider
ruleBasedProperties (dict) –
An object which defines the list of matching rules to run and has a field
rules, which is a list of rule objects.rules (list) – [REQUIRED]
A list of
Ruleobjects, each of which have fieldsRuleNameandMatchingKeys.(dict) –
An object containing the
ruleNameandmatchingKeys.ruleName (string) – [REQUIRED]
A name for the matching rule.
matchingKeys (list) – [REQUIRED]
A list of
MatchingKeys. TheMatchingKeysmust have been defined in theSchemaMapping. Two records are considered to match according to this rule if all of theMatchingKeysmatch.(string) –
attributeMatchingModel (string) – [REQUIRED]
The comparison type. You can choose
ONE_TO_ONEorMANY_TO_MANYas theattributeMatchingModel.If you choose
ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for theEmailattribute type, the system will only consider it a match if the value of theEmailfield of Profile A matches the value of theEmailfield of Profile B.If you choose
MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of theEmailfield of Profile A and the value ofBusinessEmailfield of Profile B matches, the two profiles are matched on theEmailattribute type.matchPurpose (string) –
An indicator of whether to generate IDs and index the data or not.
If you choose
IDENTIFIER_GENERATION, the process generates IDs and indexes the data.If you choose
INDEXING, the process indexes the data without generating IDs.
ruleConditionProperties (dict) –
An object containing the
rulesfor a matching workflow.rules (list) – [REQUIRED]
A list of rule objects, each of which have fields
ruleNameandcondition.(dict) –
An object that defines the
ruleConditionand theruleNameto use in a matching workflow.ruleName (string) – [REQUIRED]
A name for the matching rule.
For example:
Rule1condition (string) – [REQUIRED]
A statement that specifies the conditions for a matching rule.
If your data is accurate, use an Exact matching function:
ExactorExactManyToMany.If your data has variations in spelling or pronunciation, use a Fuzzy matching function:
Cosine,Levenshtein, orSoundex.Use operators if you want to combine (
AND), separate (OR), or group matching functions(...).For example:
(Cosine(a, 10) AND Exact(b, true)) OR ExactManyToMany(c, d)
providerProperties (dict) –
The properties of the provider service.
providerServiceArn (string) – [REQUIRED]
The ARN of the provider service.
providerConfiguration (document) –
The required configuration fields to use with the provider service.
intermediateSourceConfiguration (dict) –
The Amazon S3 location that temporarily stores your data while it processes. Your information won’t be saved permanently.
intermediateS3Path (string) – [REQUIRED]
The Amazon S3 location (bucket and prefix). For example:
s3://provider_bucket/DOC-EXAMPLE-BUCKET
incrementalRunConfig (dict) –
Optional. An object that defines the incremental run type. This object contains only the
incrementalRunTypefield, which appears as “Automatic” in the console.Warning
For workflows where
resolutionTypeisML_MATCHINGorPROVIDER, incremental processing is not supported.incrementalRunType (string) –
The type of incremental run. The only valid value is
IMMEDIATE. This appears as “Automatic” in the console.Warning
For workflows where
resolutionTypeisML_MATCHINGorPROVIDER, incremental processing is not supported.
roleArn (string) –
[REQUIRED]
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.
tags (dict) –
The tags used to organize, track, or control access for this resource.
(string) –
(string) –
- Return type:
dict
- Returns:
Response Syntax
{ 'workflowName': 'string', 'workflowArn': 'string', 'description': 'string', 'inputSourceConfig': [ { 'inputSourceARN': 'string', 'schemaName': 'string', 'applyNormalization': True|False }, ], 'outputSourceConfig': [ { 'KMSArn': 'string', 'outputS3Path': 'string', 'output': [ { 'name': 'string', 'hashed': True|False }, ], 'applyNormalization': True|False, 'customerProfilesIntegrationConfig': { 'domainArn': 'string', 'objectTypeArn': 'string' } }, ], 'resolutionTechniques': { 'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER', 'ruleBasedProperties': { 'rules': [ { 'ruleName': 'string', 'matchingKeys': [ 'string', ] }, ], 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'matchPurpose': 'IDENTIFIER_GENERATION'|'INDEXING' }, 'ruleConditionProperties': { 'rules': [ { 'ruleName': 'string', 'condition': 'string' }, ] }, 'providerProperties': { 'providerServiceArn': 'string', 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'intermediateSourceConfiguration': { 'intermediateS3Path': 'string' } } }, 'incrementalRunConfig': { 'incrementalRunType': 'IMMEDIATE' }, 'roleArn': 'string' }
Response Structure
(dict) –
workflowName (string) –
The name of the workflow.
workflowArn (string) –
The ARN (Amazon Resource Name) that Entity Resolution generated for the
MatchingWorkflow.description (string) –
A description of the workflow.
inputSourceConfig (list) –
A list of
InputSourceobjects, which have the fieldsInputSourceARNandSchemaName.(dict) –
An object containing
inputSourceARN,schemaName, andapplyNormalization.inputSourceARN (string) –
An Glue table Amazon Resource Name (ARN) for the input source table.
schemaName (string) –
The name of the schema to be retrieved.
applyNormalization (boolean) –
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an
AttributeTypeofPHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
outputSourceConfig (list) –
A list of
OutputSourceobjects, each of which contains fieldsoutputS3Path,applyNormalization,KMSArn, andoutput.(dict) –
A list of
OutputAttributeobjects, each of which have the fieldsNameandHashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.KMSArn (string) –
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
outputS3Path (string) –
The S3 path to which Entity Resolution will write the output table.
output (list) –
A list of
OutputAttributeobjects, each of which have the fieldsNameandHashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.(dict) –
A list of
OutputAttributeobjects, each of which have the fieldsNameandHashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.name (string) –
A name of a column to be written to the output. This must be an
InputFieldname in the schema mapping.hashed (boolean) –
Enables the ability to hash the column values in the output.
applyNormalization (boolean) –
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an
AttributeTypeofPHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.customerProfilesIntegrationConfig (dict) –
Specifies the Customer Profiles integration configuration for sending matched output directly to Customer Profiles. When configured, Entity Resolution automatically creates and updates customer profiles based on match clusters, eliminating the need for manual Amazon S3 integration setup.
domainArn (string) –
The Amazon Resource Name (ARN) of the Customer Profiles domain where the matched output will be sent.
objectTypeArn (string) –
The Amazon Resource Name (ARN) of the Customer Profiles object type that defines the structure for the matched customer data.
resolutionTechniques (dict) –
An object which defines the
resolutionTypeand theruleBasedProperties.resolutionType (string) –
The type of matching workflow to create. Specify one of the following types:
RULE_MATCHING: Match records using configurable rule-based criteriaML_MATCHING: Match records using machine learning modelsPROVIDER: Match records using a third-party matching provider
ruleBasedProperties (dict) –
An object which defines the list of matching rules to run and has a field
rules, which is a list of rule objects.rules (list) –
A list of
Ruleobjects, each of which have fieldsRuleNameandMatchingKeys.(dict) –
An object containing the
ruleNameandmatchingKeys.ruleName (string) –
A name for the matching rule.
matchingKeys (list) –
A list of
MatchingKeys. TheMatchingKeysmust have been defined in theSchemaMapping. Two records are considered to match according to this rule if all of theMatchingKeysmatch.(string) –
attributeMatchingModel (string) –
The comparison type. You can choose
ONE_TO_ONEorMANY_TO_MANYas theattributeMatchingModel.If you choose
ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for theEmailattribute type, the system will only consider it a match if the value of theEmailfield of Profile A matches the value of theEmailfield of Profile B.If you choose
MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of theEmailfield of Profile A and the value ofBusinessEmailfield of Profile B matches, the two profiles are matched on theEmailattribute type.matchPurpose (string) –
An indicator of whether to generate IDs and index the data or not.
If you choose
IDENTIFIER_GENERATION, the process generates IDs and indexes the data.If you choose
INDEXING, the process indexes the data without generating IDs.
ruleConditionProperties (dict) –
An object containing the
rulesfor a matching workflow.rules (list) –
A list of rule objects, each of which have fields
ruleNameandcondition.(dict) –
An object that defines the
ruleConditionand theruleNameto use in a matching workflow.ruleName (string) –
A name for the matching rule.
For example:
Rule1condition (string) –
A statement that specifies the conditions for a matching rule.
If your data is accurate, use an Exact matching function:
ExactorExactManyToMany.If your data has variations in spelling or pronunciation, use a Fuzzy matching function:
Cosine,Levenshtein, orSoundex.Use operators if you want to combine (
AND), separate (OR), or group matching functions(...).For example:
(Cosine(a, 10) AND Exact(b, true)) OR ExactManyToMany(c, d)
providerProperties (dict) –
The properties of the provider service.
providerServiceArn (string) –
The ARN of the provider service.
providerConfiguration (document) –
The required configuration fields to use with the provider service.
intermediateSourceConfiguration (dict) –
The Amazon S3 location that temporarily stores your data while it processes. Your information won’t be saved permanently.
intermediateS3Path (string) –
The Amazon S3 location (bucket and prefix). For example:
s3://provider_bucket/DOC-EXAMPLE-BUCKET
incrementalRunConfig (dict) –
An object which defines an incremental run type and has only
incrementalRunTypeas a field.incrementalRunType (string) –
The type of incremental run. The only valid value is
IMMEDIATE. This appears as “Automatic” in the console.Warning
For workflows where
resolutionTypeisML_MATCHINGorPROVIDER, incremental processing is not supported.
roleArn (string) –
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.
Exceptions