Blogs

Framework to Automate Unit Testing on Dimension Tables

Purpose of the article: This blog explains how to perform basic unit testing on a dimension table by following Slowly Changing Dimension Type 2

Intended Audience: This blog will help you understand the concept of Slowly Changing Dimensions, the difference between OLTP and OLAP, the concept of dimension tables, dimensional attributes, and how to perform unit testing on a dimension table

Tools and Technology: Python and Snowflake

This framework connects to Snowflake and performs unit testing on the attributes of a dimension table.

  1. Setup and Requirements
    1. Tools and Libraries:
      • Python: For scripting and test automation.
      • Snowflake Connector for Python: To connect to Snowflake from Python.
    1. Installation:
      • Install the required libraries using pip:pip install snowflake-connector-python
  1. Configuration
    1. Snowflake Connection Configuration:
      • Create a configuration file to store Snowflake connection details.
    1. Snowflake Connection Script:
      • Create a Python script (snowflake_connection.py) to manage the connection.
  1. Test Case Design
    1. Define Test Cases:
      • Outline the test cases for the dimension table.
    1. Test Case Script:
      • Create a script to run the test cases (dim_common_sqls.sql).
  1. Execution
    1. Run Tests:
      • Execute the test cases.
    1. Output:
      • The process creates an SQL file and an Excel sheet in the specified folder and sends an email notification.

Description:

Before sending any dimension table to QA, it is important to ensure that all dimension tables adhere to a set of general properties and standards. This applies to any dimension table created by a developer when loading data from the source to the target.

To understand these general properties, we first need to grasp how dimension tables are designed. Typically, dimension tables include a Surrogate Key, which serves as the primary key for the them. Additionally, there is a Natural Key that acts as the primary key in the transaction table, but it is not the primary key in the dimension table. For example, in a dimension table such as an Employee (Emp) table, attributes might include Employee Name and Phone Number.

Emp ID name phone surrogate
100 ABCD 1234567890 1000
100 ABCD 1234567890 1000

In this scenario, the employee’s primary key will remain unchanged, but the Surrogate Key, which serves as the primary key for the dimension table, will be updated if there are any changes in the data.

Slowly Changing Dimensions (SCD)

When designing a dimension table, there are several types of Slowly Changing Dimensions (SCD):

Type 1

If a field, such as Product Description, is changed, it is updated directly in the dimension table. Generally, this approach is not recommended, as it does not preserve historical data.

Type 2– This method inserts a new row for each new value while preserving the existing row to maintain historical data and support reporting purposes.

A Type 3 –

This method creates a Current Value column in the existing record while retaining the original column, holding new dimension data from the transaction (OLTP) system. It is used to track changes in dimension values while keeping the old value for reporting.

Typically, SCD Type 2 is preferred, retaining both previous and current records and adding attributes like Dim Active Indicator, Latest Row Indicator, Effective Timestamp, and Expiry Timestamp.

In this context, we are focusing on Slowly Changing Dimension (SCD) Type 2 and have designed our test cases accordingly. All test cases will focus on the Surrogate Key, Primary Key, Dim Active Indicator, Latest Row Indicator, Effective Timestamp, and Expiry Timestamp.

Based on these properties, we have designed a few test cases on dimensional properties. These test cases are focused solely on the dimensional properties of the table and do not address the functional properties.

STEPS:

Dimensional Automation Testing Framework: This directory includes various packages.

  • Config File: Contains configuration details.

    • Connection.ini: This file contains configuration settings for establishing a connection to Snowflake.

				
					user = ********
password = *********
account = *************
				
			
  • Input.json

				
					{
"sf_connection": {
    "env": "DEV"
},
"target": {
    "layer": "EMP",|
    "table_type": "DIM",
    "schema": "PUBLIC",
    "table_name": "EMPLOYEE_DIM",
    "pk_columns": [
    "EMPLOYEE_NUMBER"
    ],
    "as_of_dt": "EFFECTIVE_TS"
},
"op": {
    "send_mail_to": "divyav.in@mouritech.com", 
    "op_folder": "C:\\Users\\divyav\\OneDrive MOURI Tech\\Desktop\\snowflake"
}
}

				
			
  • DB Conn— Establishing a connection to Snowflake using a connection.ini file located in the Config folder.
  • IO— Includes All Initializations, Input Files, and Output Files

    • Init.py: This file contains all initializations, including:

      • Table Name

      • Schema Name

      • Database Name

      • Primary Key Columns

      • As Of Date Columns

      • Recipient Mail

      • Output Folder

  • Op_sheet.py: defining and designing the output sheet

    • Creating a sheet

    • Defining the sheet with title and content

    • Saving the sheet and the SQL file

  • Read_input_data.py: This process involves reading all input data, such as input.Json and dim_common_sqls.sql, to run the test cases.

  • Send_validation_mail: Sends email to the recipient mentioned in the input.Json file

  • SQLs—All SQL queries written in a file

    • Dim_common_sqls.sql

  • Test Cases—test case details

    • dim_test_cases: passing column names to an SQL file

    • test_case_run: run the Test cases and generate the output sheet.

  • Validations— This process performs various validations on the table, such as checking whether it is a dimension table. It acts as a gateway from the main process to the Dim Test Cases.

  • Main— from where the prog starts.

				
					if __name__ == '__main__':
  validate_test_cases.run_test_cases()
				
			

Conclusion:

By following this approach, you will set up an automation testing framework in Python that connects to the Snowflake and performs unit testing on all the attributes of the dimension tables.

Author Bio:

Picture of Divya Vasupalli

Divya Vasupalli

Associate Trainee - Data Engineering - Analytics

I'm Divya Vasupalli, and I've been with MOURI Tech for 1.10 years as an Associate Trainee. I have strong knowledge of SQL, PL/SQL, Snowflake, and Python.

Leave A Comment

Related Post

Micro Frontend in Angular

Purpose of the article: To understand how to break down a large application into smaller, manageable apps, each responsible for its own feature Intended Audience:

Read More »
Purpose to Contact :
Purpose to Contact :
Purpose to Contact :
Purpose to Contact :

Purpose to Contact :
Purpose to Contact :
Purpose to Contact :

Purpose to Contact :