๐ชThe Big Idea: Your Data's Time Travel Adventure!
Imagine if you could travel through time and see how your favorite video game character changed over the years - from level 1 newbie to level 100 hero! That's exactly what Slowly Changing Dimensions (SCD) does for your data in Databricks.
๐ฎ Think of it like this: Your data is like a video game character that gets stronger, changes appearance, or learns new skills over time. SCD helps you keep track of ALL these changes - not just the current state, but the entire journey!
๐Real-World Analogy: The Magic School Report Card System!
Let's say you're at Hogwarts (yes, the wizarding school!), and the school wants to track student information over time:
1
๐ SCD Type 1: The Overwrite Spell
Scenario: Harry Potter moves from Privet Drive to Grimmauld Place. The school just updates his address - no history kept.
Result: Only current address exists. Past address? Gone forever! ๐จ
2
๐ SCD Type 2: The Time-Turner Method
Scenario: Hermione gets promoted from Gryffindor Prefect to Head Girl. School keeps BOTH records with different time periods!
Result: You can see she was Prefect from 2019-2020 AND Head Girl from 2020-2021! ๐
3
๐ SCD Type 3: The Before & After Mirror
Scenario: Ron's house changes from "The Burrow" to "Shell Cottage". School keeps BOTH current and previous addresses in the same record!
Result: One record shows Current_House AND Previous_House! ๐
๐ปCode Examples: Let's Build Some Magic!
๐ฏ SCD Type 1 Example: Update Customer Phone
-- Step 1: Create initial customer table
CREATE TABLE customers_type1 (
customer_id INT,
name STRING,
phone STRING,
email STRING
) USING DELTA;
-- Step 2: Insert initial data
INSERT INTO customers_type1 VALUES
(1, 'Harry Potter', '555-MAGIC', 'harry@hogwarts.edu'),
(2, 'Hermione Granger', '555-BOOKS', 'hermione@hogwarts.edu');
-- Step 3: SCD Type 1 update
-- Replace old phone number with new phone number
MERGE INTO customers_type1 AS target
USING (
SELECT 1 AS customer_id, '555-NEWNUM' AS phone
) AS source
ON target.customer_id = source.customer_id
WHEN MATCHED THEN
UPDATE SET target.phone = source.phone;
๐ฏ SCD Type 2 Example: Track Customer Address History
-- Step 1: Create SCD Type 2 table with versioning
CREATE TABLE customers_type2 (
surrogate_key INT,
customer_id INT,
name STRING,
address STRING,
effective_date DATE,
end_date DATE,
is_current BOOLEAN
) USING DELTA;
-- Step 2: Complex SCD Type 2 MERGE operation
MERGE INTO customers_type2 AS target
USING (
SELECT
1 AS customer_id,
'Harry Potter' AS name,
'12 Grimmauld Place' AS address,
current_date() AS effective_date
) AS source
ON target.customer_id = source.customer_id
AND target.is_current = true
-- Step 2a: Close current record if address has changed
WHEN MATCHED AND target.address != source.address THEN
UPDATE SET
target.end_date = source.effective_date,
target.is_current = false
-- Step 2b: Insert new current record if no match
WHEN NOT MATCHED BY TARGET THEN
INSERT (
surrogate_key,
customer_id,
name,
address,
effective_date,
end_date,
is_current
)
VALUES (
monotonically_increasing_id(),
source.customer_id,
source.name,
source.address,
source.effective_date,
NULL,
true
);
๐ฏ SCD Type 3 Example: Current and Previous Values
-- Step 1: Create SCD Type 3 table
CREATE TABLE customers_type3 (
customer_id INT,
name STRING,
current_address STRING,
previous_address STRING,
address_change_date DATE
) USING DELTA;
-- Step 2: SCD Type 3 MERGE operation
-- Keep both current and previous address
MERGE INTO customers_type3 AS target
USING (
SELECT
1 AS customer_id,
'Diagon Alley 42' AS new_address
) AS source
ON target.customer_id = source.customer_id
-- Update previous and current address on match
WHEN MATCHED THEN
UPDATE SET
target.previous_address = target.current_address,
target.current_address = source.new_address,
target.address_change_date = current_date();
๐Real-World Example: E-Commerce Customer Tracking
Let's imagine you're working for "Wizard Shopping Mall" - an online store that needs to track customer changes over time!
๐ Business Scenario:
The Challenge: Track customer information changes including address updates (for shipping), loyalty tier changes (bronze โ silver โ gold), and contact information updates.
1
๐ช Setting Up the Customer Dimension
-- Step 1: Create comprehensive customer SCD Type 2 table
CREATE TABLE dim_customers (
surrogate_key BIGINT GENERATED ALWAYS AS IDENTITY, -- Unique surrogate key
customer_business_key STRING, -- Natural/business key
customer_name STRING,
email STRING,
phone STRING,
address STRING,
city STRING,
state STRING,
loyalty_tier STRING,
-- SCD Type 2 metadata columns
effective_date DATE, -- When the record became effective
end_date DATE, -- When the record was superseded
is_current BOOLEAN, -- Flag to indicate current record
record_version INT -- Version number for tracking changes
)
USING DELTA
PARTITIONED BY (is_current); -- Partition by current records for performance
๐ Processing Daily Customer Updates
-- Step 1: Daily SCD processing function
CREATE OR REPLACE FUNCTION process_customer_scd(update_date DATE)
RETURNS STRING
LANGUAGE PYTHON
AS
$$
# Step 1: Identify customers with changes
changed_customers = """
SELECT DISTINCT customer_business_key
FROM source_customer_updates
WHERE update_date = '{}'
""".format(update_date)
# Step 2: Close current records for changed customers
close_current = """
UPDATE dim_customers
SET end_date = '{}',
is_current = false
WHERE customer_business_key IN ({})
AND is_current = true
""".format(update_date, changed_customers)
# Step 3: Insert new current records
insert_new = """
INSERT INTO dim_customers
SELECT
customer_business_key,
customer_name,
email,
phone,
address,
city,
state,
loyalty_tier,
'{}' AS effective_date,
NULL AS end_date,
true AS is_current,
COALESCE(max_version + 1, 1) AS record_version
FROM source_customer_updates s
LEFT JOIN (
SELECT customer_business_key, MAX(record_version) AS max_version
FROM dim_customers
GROUP BY customer_business_key
) v
ON s.customer_business_key = v.customer_business_key
WHERE s.update_date = '{}'
""".format(update_date, update_date)
return "SCD processing completed successfully!"
$$;
๐ Result: Now you can track every customer's journey! See when they moved, got promoted to VIP status, or changed their contact info - all while keeping perfect historical records!