We can easily perform incremental data loads in SSIS using Change Data Capture (CDC).
CDC is a SQL Server tracking feature introduced in SQL Server 2008 that provides historical change activity (insert, update and delete) made to user tables.
In the following example, we will see how to develop a SSIS incremental data load package using Change Data Caputure (CDC).
* I would recommend that you have some basic knowledge in SSIS to better understand this article.
Creating OLTP, Staging and OLAP databases and tables
CustomerId int identity(1,1) primary key,
CustomerId int primary key,
Enabling Change Data Capture (CDC) for the OLTP database and Customer table
- First we need to enable CDC for the database and then enable it for each table we want to track.
- SQL Server Agent must be running.
- The user must have sysadmin privileges to enable CDC for databases.
- The user must be db_owner to enable CDC for tables.
@source_schema = N'dbo',
@source_name = N'Customer',
@role_name = NULL,
@supports_net_changes = 1;
SSIS – Initial Data Load package
Considering that there are no rows in the OLAP database, we need to perform a one-time load to populate it.
The initial load package has three tasks:
- A CDC Control Task to mark the initial load start.
- A Data Flow Task to transfer the data from the OLTP database to the OLAP database.
- A CDC Control Task to mark the initial load end.
The following figures show how to configure the Control Flow and each task
1. CDC Control Task (mark initial load start) properties:
2. Data Flow Task (load customer):
The Data Flow tab is composed of two objects: ADO NET Source and ADO NET Destination.
3. CDC Control Task (mark initial load end) properties:
Executing the initial load package
Now the initial load package is ready to be executed.
Let´s check the source and destination tables before executing it:
- Now we need to execute the SSIS initial load package to load data from the Customer table (OLTP) to the Customer_DW (OLAP).
- Press F5 to execute the package.
The initial data load worked fine! The next step is to perform the incremental data load.
SSIS – Incremetal Data Load package
This package will perform incremental loads of the change data to the OLAP database.
The incremental data load package has five tasks:
- An Execute SQL Task to truncate the staging tables.
- A CDC Control Task to get the processing range of LSNs (Log Sequence Numbers) containing the changes to be loaded.
- A Data Flow Task to load the staging tables.
- An Execute SQL Task to load change data from the Staging to the OLAP database.
- A CDC Control Task to mark the processed range and define where the next processing range will begin.
The following figures show how to configure the Control Flow and each task:
1. Execute SQL Task (clean staging tables) properties:
TRUNCATE TABLE Customer_delete;
TRUNCATE TABLE Customer_insert;
TRUNCATE TABLE Customer_update;
2. CDC Control Task (get processing range) properties:
3. Data Flow Task (load staging tables):
4. Execute SQL Task (load customer_dw) properties:
INSERT INTO Customer_DW SELECT * FROM Staging..Customer_insert;
DELETE FROM Customer_DW WHERE CustomerId IN (SELECT CustomerId FROM Staging..Customer_delete);
UPDATE c1 SET c1.CustomerName = c2.CustomerName FROM Customer_DW c1 INNER JOIN Staging..Customer_update c2 ON c1.CustomerId = c2.CustomerId;
5. CDC Control Task (mark processed range) properties:
Executing the incremental data load package
Before executing this package, let´s make some data changes in the OLTP database:
Now we can execute the following scripts to check that the OLTP and OLAP databases are different:
- Now we can execute the SSIS incremental data load package to load data from the Customer table (OLTP) to the Customer_DW (OLAP).
- Right click the package in the solution explorer and then click “Set as StartUp Object”.
- Press F5 to execute the package.
We can see that the tables are synchronized now!
I hope you enjoyed this article.
Please feel free to leave any comments, share and like it!
Thank you very much.