Skip to content

spalah.dataset.DeltaTableConfig

spalah.dataset.DeltaTableConfig.DeltaTableConfig(table_path='', table_name='', spark_session=None)

Manages Delta Table properties, constraints, etc.

Attributes:

Name Type Description
keep_existing_properties bool

Preserves existing table properties if they are not in the input value. Defaults to False

keep_existing_check_constraints bool

Preserves existing table constraints if they are not in the input value. Defaults to False

Parameters:

Name Type Description Default
table_path str

Path to delta table. For instance: /mnt/db1/table1

''
table_name str

Delta table name. For instance: db1.table1

''
spark_session SparkSession

(SparkSession, optional) The current spark context.

None

Raises:

Type Description
ValueError

if values for both 'table_path' and 'table_name' provided provide values to one of them

ValueError

if values for neither 'table_path' nor 'table_name' provided provide values to one of them

Examples:

>>> from spalah.datalake import DeltaTableConfig
>>> dp = DeltaTableConfig(table_path="/path/dataset")
>>> print(dp.properties)
{'delta.deletedFileRetentionDuration': 'interval 15 days'}
Source code in spalah/dataset/DeltaTableConfig.py
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
def __init__(
    self,
    table_path: str = "",
    table_name: str = "",
    spark_session: SparkSession = None,
) -> None:
    """
    Args:
        table_path (str, optional): Path to delta table. For instance: /mnt/db1/table1
        table_name (str, optional): Delta table name. For instance: db1.table1
        spark_session: (SparkSession, optional)  The current spark context.
    Raises:
        ValueError: if values for both 'table_path' and 'table_name' provided
                    provide values to one of them
        ValueError: if values for neither 'table_path' nor 'table_name' provided
                    provide values to one of them
    Examples:
        >>> from spalah.datalake import DeltaTableConfig
        >>> dp = DeltaTableConfig(table_path="/path/dataset")
        >>> print(dp.properties)
        {'delta.deletedFileRetentionDuration': 'interval 15 days'}
    """

    self.spark_session = (
        SparkSession.getActiveSession() if not spark_session else spark_session
    )
    self.table_name = self.__get_table_identifier(
        table_path=table_path, table_name=table_name
    )
    self.original_table_name = table_name

check_constraints: Union[dict, None] property writable

Gets/sets dataset's delta table check constraints.

Parameters:

Name Type Description Default
value dict

An input dictionary in the format: {"property_name": "value"}

required

Examples:

>>> from spalah.datalake import DeltaTableConfig
>>> dp = DeltaTableConfig(table_path="/path/dataset")
>>>
>>> # get existing constraints
>>> print(dp.check_constraints)
{}
>>>
>>> # Add a new check constraint
>>> dp.check_constraints = {'id_is_not_null': 'id is not null'}

properties: Union[dict, None] property writable

Gets/sets dataset's delta table properties.

Parameters:

Name Type Description Default
value dict

An input dictionary in the format: {"property_name": "value"}

required

Examples:

>>> from spalah.datalake import DeltaTableConfig
>>> dp = DeltaTableConfig(table_path="/path/dataset")
>>>
>>> # get existing properties
>>> print(dp.properties)
{'delta.deletedFileRetentionDuration': 'interval 15 days'}
>>>
>>> # Adjust the property value from 15 to 30 days
>>> dp.properties = {'delta.deletedFileRetentionDuration': 'interval 30 days'}