Using Charindex + Cross Apply to split delimiter-separated values (SQL Server)

Let’s suppose that we have some delimiter-separated values, like this:

charindex01

We want to separate the delimited values into five columns, like this:

charindex02

We can solve this using the following code:

DECLARE @States TABLE
(StateList VARCHAR(50));


DECLARE
 @delimiter CHAR(1);

SET @delimiter = ',';

INSERT INTO @States VALUES
('Florida,Indiana,Kansas,Georgia,Alabama'),
('New York,California,Nevada,Texas,Montana');

SELECT * FROM @States;SELECT
SUBSTRING(StateList,1,T1.Position-1) S1,
SUBSTRING(StateList,T1.Position+1,T2.Position-T1.Position-1) S2,
SUBSTRING(StateList,T2.Position+1,T3.Position-T2.Position-1) S3,
SUBSTRING(StateList,T3.Position+1,T4.Position-T3.Position-1) S4,
SUBSTRING(StateList,T4.Position+1,LEN(StateList)) S5
FROM @States
--1st delimiter position 
CROSS APPLY (SELECT (CHARINDEX(@delimiter,StateList))) as T1(Position)
--2nd delimiter position
CROSS APPLY (SELECT (CHARINDEX(@delimiter, StateList, T1.Position+1))) as T2(Position)
--3th delimiter position
CROSS APPLY (SELECT (CHARINDEX(@delimiter, StateList, T2.Position+1))) as T3(Position)
--4th delimiter position
CROSS APPLY (SELECT (CHARINDEX(@delimiter, StateList, T3.Position+1))) as T4(Position);

Incremental Data Load with SSIS and Change Data Capture (CDC)

We can easily perform incremental data loads in SSIS using Change Data Capture (CDC).
CDC is a SQL Server tracking feature introduced in SQL Server 2008 that provides historical change activity (insert, update and delete) made to user tables.

In the following example, we will see how to develop a SSIS incremental data load package using Change Data Caputure (CDC).

* I would recommend that you have some basic knowledge in SSIS to better understand this article.

Creating OLTP, Staging and OLAP databases and tables

USE master;
GO
 
CREATE DATABASE OLTP;
GO
 
USE OLTP;
 
CREATE TABLE dbo.Customer
(
CustomerId int identity(1,1) primary key,
CustomerName varchar(50)
);
 
INSERT INTO dbo.Customer (CustomerName)
VALUES
('Adam'),
('Bob'),
('Joseph'),
('Roger'),
('Samuel');
 
CREATE DATABASE Staging;
GO
 
USE Staging;
 
CREATE TABLE dbo.Customer_insert
(
CustomerId int,
CustomerName varchar(50)
);
CREATE TABLE dbo.Customer_update
(
CustomerId int,
CustomerName varchar(50)
);
CREATE TABLE dbo.Customer_delete
(
CustomerId int,
CustomerName varchar(50)
);
 
CREATE DATABASE OLAP;
GO
 
USE OLAP;
 
CREATE TABLE dbo.Customer_DW
(
CustomerId int primary key,
CustomerName varchar(50)
);

 

Enabling Change Data Capture (CDC) for the OLTP database and Customer table

  • First we need to enable CDC for the database and then enable it for each table we want to track.
  • SQL Server Agent must be running.
  • The user must have sysadmin privileges to enable CDC for databases.
  • The user must be db_owner to enable CDC for tables.
USE OLTP;
GO
 
EXEC sp_cdc_enable_db;
GO
 
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'Customer',
@role_name = NULL,
@supports_net_changes = 1;
GO



SSIS – Initial Data Load package

Considering that there are no rows in the OLAP database, we need to perform a one-time load to populate it.

The initial load package has three tasks:

  1. A CDC Control Task to mark the initial load start.
  2. A Data Flow Task to transfer the data from the OLTP database to the OLAP database.
  3. A CDC Control Task to mark the initial load end.

The following figures show how to configure the Control Flow and each task

Control Flow:

Control Flow

Control Flow with three tasks

1. CDC Control Task (mark initial load start) properties:

CDC Control Task (mark initial load) properties

CDC Control Task (mark initial load start) properties

2. Data Flow Task (load customer):

Data Flow

The Data Flow tab is composed of two objects: ADO NET Source and ADO NET Destination.

3. CDC Control Task (mark initial load end) properties:

ssis_cdc_control_task_mark_initial_load_end.jpg

CDC Control Task (mark initial load end) properties

Executing the initial load package 

Now the initial load package is ready to be executed.

Let´s check the source and destination tables before executing it:

USE OLTP;
GO
 
SELECT * FROM Customer;
GO
 
USE OLAP;
GO
 
SELECT * FROM Customer_DW;
GO
result_set

Result sets berofe execution

  • Now we need to execute the SSIS initial load package to load data from the Customer table (OLTP) to the Customer_DW (OLAP).
  • Press F5 to execute the package.
Check the results:
result_il_control_flow

Control Flow tab

 Let´s check the tables after the execution:
USE OLTP;
GO
 
SELECT * FROM Customer;
GO
 
USE OLAP;
GO
 
SELECT * FROM Customer_DW;
GO
result_set2

Result sets after execution

The initial data load worked fine! The next step is to perform the incremental data load.

SSIS – Incremetal Data Load package

This package will perform incremental loads of the change data to the OLAP database.

The incremental data load package has five tasks:

  1. An Execute SQL Task to truncate the staging tables.
  2. A CDC Control Task to get the processing range of LSNs (Log Sequence Numbers) containing the changes to be loaded.
  3. A Data Flow Task to load the staging tables.
  4. An Execute SQL Task to load change data from the Staging to the OLAP database.
  5. A CDC Control Task to mark the processed range and define where the next processing range will begin.

The following figures show how to configure the Control Flow and each task:

Control Flow:

incremental_load_control_flow

Control Flow with five tasks

1. Execute SQL Task (clean staging tables) properties:

Task1

Execute SQL Task (clean staging tables) properties

SQL Statement:

TRUNCATE TABLE Customer_delete;
TRUNCATE TABLE Customer_insert;
TRUNCATE TABLE Customer_update;

2. CDC Control Task (get processing range) properties:

Task2.jpg

CDC Control Task (get processing range) properties

3. Data Flow Task (load staging tables):

Task3

Data Flow Task (load staging tables)

4. Execute SQL Task (load customer_dw) properties:

Task4.jpg

Execute SQL Task (load customer_dw) properties

SQL Statement:

INSERT INTO Customer_DW SELECT * FROM Staging..Customer_insert;
DELETE FROM Customer_DW WHERE CustomerId IN (SELECT CustomerId FROM Staging..Customer_delete);
UPDATE c1 SET c1.CustomerName = c2.CustomerName FROM Customer_DW c1 INNER JOIN Staging..Customer_update c2 ON c1.CustomerId = c2.CustomerId;

5. CDC Control Task (mark processed range) properties:

Task5

CDC Control Task (mark processed range) properties

Executing the incremental data load package

Before executing this package, let´s make some data changes in the OLTP database:

USE OLTP;
GO
 
DELETE FROM Customer WHERE CustomerId = 4;
 
INSERT INTO Customer (CustomerName) VALUES ('Martin');
 
UPDATE Customer SET CustomerName = 'Jason' WHERE CustomerId = 1;

 

Now we can execute the following scripts to check that the OLTP and OLAP databases are different:

USE OLTP;
GO
 
SELECT * FROM Customer;
GO
 
USE OLAP;
GO
 
SELECT * FROM Customer_DW;
GO
resultset2.jpg

Result sets before executing the incremental data load. There are some differences between the databases.

  • Now we can execute the SSIS incremental data load package to load data from the Customer table (OLTP) to the Customer_DW (OLAP).
  • Right click the package in the solution explorer and then click “Set as StartUp Object”.
  • Press F5 to execute the package.
Check the results:
result_control_flow

Control Flow tab

result_data_flow

Data Flow tab

Let´s check the tables after the execution:
USE OLTP;
GO
 
SELECT * FROM Customer;
GO
 
USE OLAP;
GO
 
SELECT * FROM Customer_DW;
GO
result_set_after_execution

Result sets after the incremental data load execution

We can see that the tables are synchronized now!

I hope you enjoyed this article.

Please feel free to leave any comments, share and like it!

Thank you very much.

Rafael Juca.

 

SQL Server – Using MERGE to insert, update or delete data

Using the MERGE statement, you can synchronize two tables. In other words, you can insert, update or delete data in a target table based on a source table using a single statement.

EXAMPLE

DECLARE @TennisRackets_Source table(
id int NOT NULL PRIMARY KEY,
name varchar(40) NOT NULL,
price money NOT NULL);
 
DECLARE @TennisRackets_Target table(
id int NOT NULL PRIMARY KEY,
name varchar(40) NOT NULL,
price money NOT NULL);
 
INSERT INTO @TennisRackets_Source VALUES
(1, ‘Head Prestige Pro’, 150),
(2, ‘Head Prestige MP’, 130),
(3, ‘Head Prestige S’, 120),
(4, ‘Head Speed Rev Pro’, 145),
(5, ‘Head Speed MP A’, 135),
(6, ‘Head Speed S’, 125);
 
INSERT INTO @TennisRackets_Target VALUES
(1, ‘Head Prestige Pro’, 150),
(2, ‘Head Prestige MP’, 140),
(3, ‘Head Prestige S’, 120),
(4, ‘Head Speed Rev Pro’, 135),
(7, ‘Head Radical MP’, 110);
COMPARING DATA IN SOURCE AND TARGET TABLES
SELECT * FROM  @TennisRackets_Source;
SELECT * FROM  @TennisRackets_Target;

MERGE_01

  • The rows with ids = 1, 2, 3, 4 are matched (source id = target id). The target rows will be updated with the respective source rows.
  • The rows with ids = 5, 6 are not matched by a target id. These rows will be inserted in the target table.
  • The row with id = 7 is not matched by a source id. This rows will be deleted from the target table.

USING THE MERGE STATEMENT TO SYNCHRONIZE THE TABLES

MERGE INTO @TennisRackets_Target AS TGT
USING @TennisRackets_Source AS SRC
ON SRC.id = TGT.id
WHEN matched THEN UPDATE
   SET TGT.NAME = SRC.NAME,
           TGT.price = SRC.price
WHEN NOT matched THEN INSERT
   VALUES(SRC.id, SRC.NAME, SRC.price)
WHEN NOT matched BY source THEN DELETE;

RESULT SETS AFTER THE EXECUTION

SELECT * FROM  @TennisRackets_Source;
SELECT * FROM  @TennisRackets_Target;
MERGE_02

Observe that the rows are identical after the MERGE statement execution.

Using predicates to avoid unnecessary updates

We can also use predicates to avoid updates when there is no difference between target and source rows.

Look at the following example:

MERGE INTO @TennisRackets_Target AS TGT
USING @TennisRackets_Source AS SRC
ON SRC.id = TGT.id
WHEN matched  AND (SRC.name <> TGT.name OR SRC.price <> TGT.price) THEN UPDATE
   SET TGT.NAME = SRC.NAME,
           TGT.price = SRC.price
WHEN NOT matched THEN INSERT
   VALUES(SRC.id, SRC.NAME, SRC.price)
WHEN NOT matched BY source THEN DELETE;

Now the target table will be updated only when the source id is equal to the target id and one of the nonkey columns (name or price) have different values.

SQL Server – Window Offset Functions (LAG, LEAD, FIRST_VALUE, LAST_VALUE)

SQL Server 2012 has introduced Window offset functions. These functions can be divided into two groups:

  • Functions that allow us to return a value from a row that is in a specific offset from the current row (LAG and LEAD functions). NOTE: The default offset is 1. In other words, the default offset is the previous row when using LAG or the next row when using LEAD.
  • Functions that allow us to return a value from the first or last row in a certain window frame (FIRST_VALUE and LAST_VALUE functions).

EXAMPLES

DECLARE @AnnualSales table (
state CHAR(2),
year varchar(4),
total money);

INSERT INTO @AnnualSales (state,year,total)
VALUES
('CA','2010',20540),
('CA','2011',15700),
('CA','2012',5980),
('CA','2013',12800),
('CA','2014',13600),
('CA','2015',14200),
('NY','2010',7850),
('NY','2011',11830),
('NY','2012',18150),
('NY','2013',21430),
('NY','2014',19250),
('NY','2015',16300);
 
LAG – default arguments
 
SELECT
state,
year,
total,
LAG(total) OVER (PARTITION BY state ORDER BY year) AS 'LAG (year-1)'
FROM @AnnualSales
ORDER BY state, year;
 
window_functions_lag_01

The LAG offset is set to 1 by default. When there is not value for the lag, the default value is NULL.

LAG – defining arguments
 
SELECT
state,
year,
total,
LAG(total,2,0) OVER (PARTITION BY state ORDER BY year) AS 'LAG (year-2)'
FROM @AnnualSales
ORDER BY state, year;
 
 window_functions_lag_02

The LAG offset is set to 2. When there is not value for the lag, NULL is replaced with 0.

LEAD – default arguments
 
SELECT
state,
year,
total,
LEAD(total) OVER (PARTITION BY state ORDER BY year) AS 'LEAD (year+1)'
FROM @AnnualSales
ORDER BY state, year;
 
window_functions_lead_01

The LEAD offset is set to 1 by default. When there is not value for the lead, the default value is NULL.

LEAD – defining arguments
 
SELECT
state,
year,
total,
LEAD(total,2,0) OVER (PARTITION BY state ORDER BY year) AS 'LEAD (year+2)'
FROM @AnnualSales
ORDER BY state, year;
 
window_functions_lead_02

The LEAD offset is set to 2. When there is not value for the lead, NULL is replaced with 0.

FIRST_VALUE and LAST_VALUE

SELECT
state,
year,
total,
FIRST_VALUE(total) OVER (PARTITION BY state ORDER BY year
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS 'FIRST_VALUE',
LAST_VALUE(total) OVER (PARTITION BY state ORDER BY year
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS 'LAST_VALUE'
FROM @AnnualSales
ORDER BY state, year;

window_functions_first_value_last_value

The first value found in CA is 20540,00. The last value found in CA is 14200,00. The first value found in NY is 7850,00. The last value found in NY is 16300,00.

 

SQL Server – Using TRY/CATCH

In the following examples, we will see how to implement error handling in SQL Server using TRY/CATCH

CREATE TABLE dbo.Products(
productid INT CONSTRAINT PK_Products_productid PRIMARY KEY,
productname VARCHAR(30));
 
CREATE TABLE dbo.Orders(
orderid INT CONSTRAINT PK_Orders_orderid PRIMARY KEY,
orderdate DATETIME);
 
CREATE TABLE dbo.OrderDetails(
orderid INT CONSTRAINT FK_OrderDetails_Order FOREIGN KEY REFERENCES dbo.Orders,
productid INT CONSTRAINT FK_OrderDetails_Product FOREIGN KEY REFERENCES dbo.Products);
 
INSERT INTO dbo.Products VALUES
(1, 'Product A'),
(2, 'Product B'),
(3, 'Product C');
 
INSERT INTO dbo.Orders VALUES
(1, CURRENT_TIMESTAMP);
 
INSERT INTO dbo.OrderDetails VALUES
(1, 1),
(1, 2),
(1, 3);
 

Violation of Primary Key

BEGIN TRY
   INSERT INTO dbo.Products VALUES (3, 'Product D');
END TRY
BEGIN CATCH
   SELECT
   'ERROR NUMBER: ' + CAST(ERROR_NUMBER() as varchar(8000)) +
   ' ERROR MESSAGE: ' + ERROR_MESSAGE() AS 'Error description';
END CATCH
ERROR NUMBER: 2627 ERROR MESSAGE: Violation of PRIMARY KEY constraint 'PK_Products_productid'. Cannot insert duplicate key in object 'dbo.Products'. The duplicate key value is (3).

ERROR NUMBER: 2627 ERROR MESSAGE: Violation of PRIMARY KEY constraint 'PK_Products_productid'. Cannot insert duplicate key in object 'dbo.Products'. The duplicate key value is (3).

Violation of Foreign Key

BEGIN TRY
   INSERT INTO dbo.OrderDetails (orderid,productid) VALUES (1,4)
END TRY
BEGIN CATCH
   SELECT
   'ERROR NUMBER: ' + CAST(ERROR_NUMBER() as varchar(8000)) +
   ' ERROR MESSAGE: ' + ERROR_MESSAGE() AS 'Error description';
END CATCH

ERROR NUMBER: 547 ERROR MESSAGE: The INSERT statement conflicted with the FOREIGN KEY constraint “FR_OrderDetails_Product”. The conflict occurred in database “TSQL2012”, table “dbo.Products”, column 'productid'.

Dropping sample tables

DROP TABLE dbo.OrderDetails;
DROP TABLE dbo.Products;
DROP TABLE dbo.Orders;

SQL Server – SEQUENCE objects

SQL Server 2012 has introduced SEQUENCE objects, which are an alternative to the identity column property.

A sequence is an idependent object in SQL Server that is used to generate numeric values in ascending or descending order.

SEQUENCE Vs IDENTITY

Sequence Identity
User-defined database object Column property
Independent object Tied to a specific column in a table
We can generate the value before using it We need to insert the row to get the next identity value
We can cache a sequence to improve performance Doesn´t provide cache option
Provides an option to generate a range of sequence numbers using the sp_sequence_get_range stored procedure Not possible to generate a range of values

SEQUENCE PROPERTIES

  • START WITH: The start value. The default value is MINVALUE for an ascending sequence and MAXVALUE value for a descending sequence.
  • INCREMENT BY: The increment value. The default value is 1. If we use a negative value, the sequence will be descending.
  • MINVALUE: The minimum value of the sequence. The default is the minimum value of the data type of the sequence. For example, it will be 0 for a TINYINT data type and -2147483648 for a INT data type.
  • MAXVALUE: The maximum value of the sequence. The default is the maximum value of the data type of the sequence. For example, it will be 255 for a TINYINT data type and 2147483647 for a INT data type.
  • CYCLE | NO CYCLE: Defines whether the sequence should restart once it reaches to the maximum value (for ascending sequence objects) or minimum value (for descending sequence objects). The default is NO CYCLE.
  • CACHE | NO CACHE: Caches a sequence to improve performance reducing the number of disk IOs. The default is CACHE.

EXAMPLES

Creating a Sequence that starts with 1 and increases by 1

CREATE SEQUENCE dbo.SeqCustomerID AS INT
START WITH 1
INCREMENT BY 1;
GO

Using the dbo.SeqCustomerID sequence in an Insert

CREATE TABLE dbo.Customers
(
customerid INT NOT NULL PRIMARY KEY,
customername VARCHAR(50) NOT NULL
);
 
INSERT INTO dbo.Customers VALUES
(NEXT VALUE FOR dbo.SeqCustomerID,'Roger'),
(NEXT VALUE FOR dbo.SeqCustomerID,'Michael'),
(NEXT VALUE FOR dbo.SeqCustomerID,'Brian');
 
SELECT * FROM dbo.Customers;
SEQUENCE_01

dbo.Customers Result Set

Creating a Sequence that starts with 0 and decreases by 1

CREATE SEQUENCE dbo.SeqCustomerIDDesc AS INT
START WITH 0
INCREMENT BY -1;
GO

Using the dbo.SeqCustomerIDDesc sequence in an Insert

CREATE TABLE dbo.CustomersDesc
(
customerid INT NOT NULL PRIMARY KEY,
customername VARCHAR(50) NOT NULL
);
 
INSERT INTO dbo.CustomersDesc VALUES
(NEXT VALUE FOR dbo.SeqCustomerIDDesc,'Roger'),
(NEXT VALUE FOR dbo.SeqCustomerIDDesc,'Michael'),
(NEXT VALUE FOR dbo.SeqCustomerIDDesc,'Brian');
 
SELECT * FROM dbo.CustomersDesc;
dbo.CustomersDesc Result Set

dbo.CustomersDesc Result Set

Creating a Sequence using all properties

CREATE SEQUENCE dbo.SeqAllProperties
AS INT
START WITH 20
INCREMENT BY 5
MINVALUE 15
MAXVALUE 30
CYCLE
CACHE 10;

Execute the following commands and check the results

SELECT NEXT VALUE FOR dbo.SeqAllProperties AS 'First sequence number';
SELECT NEXT VALUE FOR dbo.SeqAllProperties AS 'Second sequence number';
SELECT NEXT VALUE FOR dbo.SeqAllProperties AS 'Third sequence number';
SELECT NEXT VALUE FOR dbo.SeqAllProperties AS 'Fourth sequence number';

SEQUENCE_03

  • The fisrt value is 20 (START WITH 20).
  • The second value is 25 (INCREMENT BY 5).
  • The third value is 30. The sequence reaches its maximum value (MAXVALUE 30).
  • The fourth value is 15. The sequence restarts with 15 because it is configured to cycle and the sequence minimum value is 15 (MINVALUE 15).

Dropping SEQUENCE objects

DROP SEQUENCE dbo.SeqCustomerID;
DROP SEQUENCE dbo.SeqCustomerIDDesc;
DROP SEQUENCE dbo.SeqAllProperties;

SQL Server – OFFSET FETCH Clause

The OFFSET-FETCH option is extremely useful for paging result sets.

Main features:

  • Introduced in SQL Server 2012.
  • It appears after the ORDER BY clause which is mandatory to use OFFSET/FETCH.
  • The OFFSET clause indicates how many rows to skip and the FETCH clause indicates how many rows to return after skipping.
  • The FETCH clause always requires an OFFSET clause. So if you don´t want to skip any rows, you need to specify: OFFSET 0 ROWS.

EXAMPLE

DECLARE @Customers table (
custid int IDENTITY(1,1),
companyname varchar(30),
contactname varchar(30));

INSERT INTO @Customers (companyname,contactname)
VALUES
('Customer MLTDN','Hassall, Mark'),
('Customer KBUDE','Peoples, John'),
('Customer HFBZG','Arndt, Torsten'),
('Customer HGVLZ','Higginbotham, Tom'),
('Customer XHXJV','Poland, Carole'),
('Customer QXVLA','Bansal, Dushyant'),
('Customer QUHWH','Ilyina, Julia'),
('Customer RTXGC','Raghav, Amritansh'),
('Customer EEALV','Bassols, Pilar Colome'),
('Customer UBHAU','Jaffe, David');

SELECTING ALL ROWS FROM @Customers

SELECT *
FROM   @Customers
ORDER  BY custid;

Selecting all rows from @Customers

USING OFFSET

SELECT *
FROM @Customers
ORDER BY custid
OFFSET 2 ROWS;

OFFSET_FETCH_02

skip first two rows

USING OFFSET WITH FETCH

SELECT *
FROM @Customers
ORDER BY custid
OFFSET 2 ROWS
FETCH NEXT 5 ROWS ONLY;

OFFSET_FETCH_03

Skip first two rows and return next five rows.

USING ONLY FETCH (ERROR)

SELECT *
FROM @Customers
ORDER BY custid
FETCH NEXT 5 ROWS ONLY;

OFFSET_FETCH_04

OFFSET is mandatory.