Glue / Client / update_column_statistics_for_table

update_column_statistics_for_table#

Glue.Client.update_column_statistics_for_table(**kwargs)#

Creates or updates table statistics of columns.

The Identity and Access Management (IAM) permission required for this operation is UpdateTable.

See also: AWS API Documentation

Request Syntax

response = client.update_column_statistics_for_table(
    CatalogId='string',
    DatabaseName='string',
    TableName='string',
    ColumnStatisticsList=[
        {
            'ColumnName': 'string',
            'ColumnType': 'string',
            'AnalyzedTime': datetime(2015, 1, 1),
            'StatisticsData': {
                'Type': 'BOOLEAN'|'DATE'|'DECIMAL'|'DOUBLE'|'LONG'|'STRING'|'BINARY',
                'BooleanColumnStatisticsData': {
                    'NumberOfTrues': 123,
                    'NumberOfFalses': 123,
                    'NumberOfNulls': 123
                },
                'DateColumnStatisticsData': {
                    'MinimumValue': datetime(2015, 1, 1),
                    'MaximumValue': datetime(2015, 1, 1),
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'DecimalColumnStatisticsData': {
                    'MinimumValue': {
                        'UnscaledValue': b'bytes',
                        'Scale': 123
                    },
                    'MaximumValue': {
                        'UnscaledValue': b'bytes',
                        'Scale': 123
                    },
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'DoubleColumnStatisticsData': {
                    'MinimumValue': 123.0,
                    'MaximumValue': 123.0,
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'LongColumnStatisticsData': {
                    'MinimumValue': 123,
                    'MaximumValue': 123,
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'StringColumnStatisticsData': {
                    'MaximumLength': 123,
                    'AverageLength': 123.0,
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'BinaryColumnStatisticsData': {
                    'MaximumLength': 123,
                    'AverageLength': 123.0,
                    'NumberOfNulls': 123
                }
            }
        },
    ]
)
Parameters:
  • CatalogId (string) – The ID of the Data Catalog where the partitions in question reside. If none is supplied, the Amazon Web Services account ID is used by default.

  • DatabaseName (string) –

    [REQUIRED]

    The name of the catalog database where the partitions reside.

  • TableName (string) –

    [REQUIRED]

    The name of the partitions’ table.

  • ColumnStatisticsList (list) –

    [REQUIRED]

    A list of the column statistics.

    • (dict) –

      Represents the generated column-level statistics for a table or partition.

      • ColumnName (string) – [REQUIRED]

        Name of column which statistics belong to.

      • ColumnType (string) – [REQUIRED]

        The data type of the column.

      • AnalyzedTime (datetime) – [REQUIRED]

        The timestamp of when column statistics were generated.

      • StatisticsData (dict) – [REQUIRED]

        A ColumnStatisticData object that contains the statistics data values.

        • Type (string) – [REQUIRED]

          The type of column statistics data.

        • BooleanColumnStatisticsData (dict) –

          Boolean column statistics data.

          • NumberOfTrues (integer) – [REQUIRED]

            The number of true values in the column.

          • NumberOfFalses (integer) – [REQUIRED]

            The number of false values in the column.

          • NumberOfNulls (integer) – [REQUIRED]

            The number of null values in the column.

        • DateColumnStatisticsData (dict) –

          Date column statistics data.

          • MinimumValue (datetime) –

            The lowest value in the column.

          • MaximumValue (datetime) –

            The highest value in the column.

          • NumberOfNulls (integer) – [REQUIRED]

            The number of null values in the column.

          • NumberOfDistinctValues (integer) – [REQUIRED]

            The number of distinct values in a column.

        • DecimalColumnStatisticsData (dict) –

          Decimal column statistics data. UnscaledValues within are Base64-encoded binary objects storing big-endian, two’s complement representations of the decimal’s unscaled value.

          • MinimumValue (dict) –

            The lowest value in the column.

            • UnscaledValue (bytes) – [REQUIRED]

              The unscaled numeric value.

            • Scale (integer) – [REQUIRED]

              The scale that determines where the decimal point falls in the unscaled value.

          • MaximumValue (dict) –

            The highest value in the column.

            • UnscaledValue (bytes) – [REQUIRED]

              The unscaled numeric value.

            • Scale (integer) – [REQUIRED]

              The scale that determines where the decimal point falls in the unscaled value.

          • NumberOfNulls (integer) – [REQUIRED]

            The number of null values in the column.

          • NumberOfDistinctValues (integer) – [REQUIRED]

            The number of distinct values in a column.

        • DoubleColumnStatisticsData (dict) –

          Double column statistics data.

          • MinimumValue (float) –

            The lowest value in the column.

          • MaximumValue (float) –

            The highest value in the column.

          • NumberOfNulls (integer) – [REQUIRED]

            The number of null values in the column.

          • NumberOfDistinctValues (integer) – [REQUIRED]

            The number of distinct values in a column.

        • LongColumnStatisticsData (dict) –

          Long column statistics data.

          • MinimumValue (integer) –

            The lowest value in the column.

          • MaximumValue (integer) –

            The highest value in the column.

          • NumberOfNulls (integer) – [REQUIRED]

            The number of null values in the column.

          • NumberOfDistinctValues (integer) – [REQUIRED]

            The number of distinct values in a column.

        • StringColumnStatisticsData (dict) –

          String column statistics data.

          • MaximumLength (integer) – [REQUIRED]

            The size of the longest string in the column.

          • AverageLength (float) – [REQUIRED]

            The average string length in the column.

          • NumberOfNulls (integer) – [REQUIRED]

            The number of null values in the column.

          • NumberOfDistinctValues (integer) – [REQUIRED]

            The number of distinct values in a column.

        • BinaryColumnStatisticsData (dict) –

          Binary column statistics data.

          • MaximumLength (integer) – [REQUIRED]

            The size of the longest bit sequence in the column.

          • AverageLength (float) – [REQUIRED]

            The average bit sequence length in the column.

          • NumberOfNulls (integer) – [REQUIRED]

            The number of null values in the column.

Return type:

dict

Returns:

Response Syntax

{
    'Errors': [
        {
            'ColumnStatistics': {
                'ColumnName': 'string',
                'ColumnType': 'string',
                'AnalyzedTime': datetime(2015, 1, 1),
                'StatisticsData': {
                    'Type': 'BOOLEAN'|'DATE'|'DECIMAL'|'DOUBLE'|'LONG'|'STRING'|'BINARY',
                    'BooleanColumnStatisticsData': {
                        'NumberOfTrues': 123,
                        'NumberOfFalses': 123,
                        'NumberOfNulls': 123
                    },
                    'DateColumnStatisticsData': {
                        'MinimumValue': datetime(2015, 1, 1),
                        'MaximumValue': datetime(2015, 1, 1),
                        'NumberOfNulls': 123,
                        'NumberOfDistinctValues': 123
                    },
                    'DecimalColumnStatisticsData': {
                        'MinimumValue': {
                            'UnscaledValue': b'bytes',
                            'Scale': 123
                        },
                        'MaximumValue': {
                            'UnscaledValue': b'bytes',
                            'Scale': 123
                        },
                        'NumberOfNulls': 123,
                        'NumberOfDistinctValues': 123
                    },
                    'DoubleColumnStatisticsData': {
                        'MinimumValue': 123.0,
                        'MaximumValue': 123.0,
                        'NumberOfNulls': 123,
                        'NumberOfDistinctValues': 123
                    },
                    'LongColumnStatisticsData': {
                        'MinimumValue': 123,
                        'MaximumValue': 123,
                        'NumberOfNulls': 123,
                        'NumberOfDistinctValues': 123
                    },
                    'StringColumnStatisticsData': {
                        'MaximumLength': 123,
                        'AverageLength': 123.0,
                        'NumberOfNulls': 123,
                        'NumberOfDistinctValues': 123
                    },
                    'BinaryColumnStatisticsData': {
                        'MaximumLength': 123,
                        'AverageLength': 123.0,
                        'NumberOfNulls': 123
                    }
                }
            },
            'Error': {
                'ErrorCode': 'string',
                'ErrorMessage': 'string'
            }
        },
    ]
}

Response Structure

  • (dict) –

    • Errors (list) –

      List of ColumnStatisticsErrors.

      • (dict) –

        Encapsulates a ColumnStatistics object that failed and the reason for failure.

        • ColumnStatistics (dict) –

          The ColumnStatistics of the column.

          • ColumnName (string) –

            Name of column which statistics belong to.

          • ColumnType (string) –

            The data type of the column.

          • AnalyzedTime (datetime) –

            The timestamp of when column statistics were generated.

          • StatisticsData (dict) –

            A ColumnStatisticData object that contains the statistics data values.

            • Type (string) –

              The type of column statistics data.

            • BooleanColumnStatisticsData (dict) –

              Boolean column statistics data.

              • NumberOfTrues (integer) –

                The number of true values in the column.

              • NumberOfFalses (integer) –

                The number of false values in the column.

              • NumberOfNulls (integer) –

                The number of null values in the column.

            • DateColumnStatisticsData (dict) –

              Date column statistics data.

              • MinimumValue (datetime) –

                The lowest value in the column.

              • MaximumValue (datetime) –

                The highest value in the column.

              • NumberOfNulls (integer) –

                The number of null values in the column.

              • NumberOfDistinctValues (integer) –

                The number of distinct values in a column.

            • DecimalColumnStatisticsData (dict) –

              Decimal column statistics data. UnscaledValues within are Base64-encoded binary objects storing big-endian, two’s complement representations of the decimal’s unscaled value.

              • MinimumValue (dict) –

                The lowest value in the column.

                • UnscaledValue (bytes) –

                  The unscaled numeric value.

                • Scale (integer) –

                  The scale that determines where the decimal point falls in the unscaled value.

              • MaximumValue (dict) –

                The highest value in the column.

                • UnscaledValue (bytes) –

                  The unscaled numeric value.

                • Scale (integer) –

                  The scale that determines where the decimal point falls in the unscaled value.

              • NumberOfNulls (integer) –

                The number of null values in the column.

              • NumberOfDistinctValues (integer) –

                The number of distinct values in a column.

            • DoubleColumnStatisticsData (dict) –

              Double column statistics data.

              • MinimumValue (float) –

                The lowest value in the column.

              • MaximumValue (float) –

                The highest value in the column.

              • NumberOfNulls (integer) –

                The number of null values in the column.

              • NumberOfDistinctValues (integer) –

                The number of distinct values in a column.

            • LongColumnStatisticsData (dict) –

              Long column statistics data.

              • MinimumValue (integer) –

                The lowest value in the column.

              • MaximumValue (integer) –

                The highest value in the column.

              • NumberOfNulls (integer) –

                The number of null values in the column.

              • NumberOfDistinctValues (integer) –

                The number of distinct values in a column.

            • StringColumnStatisticsData (dict) –

              String column statistics data.

              • MaximumLength (integer) –

                The size of the longest string in the column.

              • AverageLength (float) –

                The average string length in the column.

              • NumberOfNulls (integer) –

                The number of null values in the column.

              • NumberOfDistinctValues (integer) –

                The number of distinct values in a column.

            • BinaryColumnStatisticsData (dict) –

              Binary column statistics data.

              • MaximumLength (integer) –

                The size of the longest bit sequence in the column.

              • AverageLength (float) –

                The average bit sequence length in the column.

              • NumberOfNulls (integer) –

                The number of null values in the column.

        • Error (dict) –

          An error message with the reason for the failure of an operation.

          • ErrorCode (string) –

            The code associated with this error.

          • ErrorMessage (string) –

            A message describing the error.

Exceptions