Author Archives: Rafael Juca

Automatically create index on all Foreign Keys (PostgreSQL)

Postgres does not automatically create indexes on Foreign Keys.

The above query generates the create index statement for all foreign keys with no indexes:

select 'create index ' || conname || '_idx on ' || conrelid ||
' (' || array_to_string(column_name_list, ',') || ');' as create_script
from (select distinct
conrelid,
conname,
array_agg(attname) column_name_list,
array_agg(attnum) as column_list,
nspname
from pg_attribute
join (select conrelid::regclass,
conname,
unnest(conkey) as column_index,
nspname
from (select distinct
conrelid, conname, conkey, nspname
from pg_constraint
join pg_class on pg_class.oid = pg_constraint.conrelid
join pg_namespace on pg_namespace.oid = pg_class.relnamespace
where contype = 'f' and nspname !~ '^pg_'
and nspname <> 'information_schema'
) fkey
) fkey
on fkey.conrelid = pg_attribute.attrelid
and fkey.column_index = pg_attribute.attnum
group by conrelid, conname, nspname
) candidate_index
join pg_class on pg_class.oid = candidate_index.conrelid
left join pg_index on pg_index.indrelid = conrelid
and indkey::text = array_to_string(column_list, ' ')
where
nspname = 'your_schema' and
indexrelid is null
order by relname;

Advertisements

Function to generate insert statements from table (PostgreSQL)

This function is useful to return insert statements from a table:

CREATE OR REPLACE FUNCTION public.generate_inserts(varSchema text, varTable text) RETURNS TABLE(resultado text) AS $$

DECLARE CODE TEXT;

BEGIN
CODE :=
(SELECT
'SELECT ''INSERT INTO '
|| table_schema || '.'
|| table_name ||' ('
|| replace(replace(array_agg(column_name::text)::text,'{',''),'}','') || ') VALUES (''||'
|| replace(replace(replace(array_agg('quote_nullable(' || column_name::text || ')')::text,'{',''),'}',''),',',' || '','' || ')
|| ' || '');'' '
|| 'FROM ' || table_schema || '.' || table_name || ';'
FROM information_schema.columns
WHERE table_schema = varSchema
AND table_name = varTable
GROUP BY table_schema, table_name);

RETURN QUERY
EXECUTE CODE;
END;
$$ LANGUAGE plpgsql;

Calling the function:

SELECT public.generate_inserts('YourSchema','YourTable');

Using Charindex + Cross Apply to split delimiter-separated values (SQL Server)

Let’s suppose that we have some delimiter-separated values, like this:

charindex01

We want to separate the delimited values into five columns, like this:

charindex02

We can solve this using the following code:

DECLARE @States TABLE
(StateList VARCHAR(50));


DECLARE
 @delimiter CHAR(1);

SET @delimiter = ',';

INSERT INTO @States VALUES
('Florida,Indiana,Kansas,Georgia,Alabama'),
('New York,California,Nevada,Texas,Montana');

SELECT * FROM @States;SELECT
SUBSTRING(StateList,1,T1.Position-1) S1,
SUBSTRING(StateList,T1.Position+1,T2.Position-T1.Position-1) S2,
SUBSTRING(StateList,T2.Position+1,T3.Position-T2.Position-1) S3,
SUBSTRING(StateList,T3.Position+1,T4.Position-T3.Position-1) S4,
SUBSTRING(StateList,T4.Position+1,LEN(StateList)) S5
FROM @States
--1st delimiter position 
CROSS APPLY (SELECT (CHARINDEX(@delimiter,StateList))) as T1(Position)
--2nd delimiter position
CROSS APPLY (SELECT (CHARINDEX(@delimiter, StateList, T1.Position+1))) as T2(Position)
--3th delimiter position
CROSS APPLY (SELECT (CHARINDEX(@delimiter, StateList, T2.Position+1))) as T3(Position)
--4th delimiter position
CROSS APPLY (SELECT (CHARINDEX(@delimiter, StateList, T3.Position+1))) as T4(Position);

Incremental Data Load with SSIS and Change Data Capture (CDC)

We can easily perform incremental data loads in SSIS using Change Data Capture (CDC).
CDC is a SQL Server tracking feature introduced in SQL Server 2008 that provides historical change activity (insert, update and delete) made to user tables.

In the following example, we will see how to develop a SSIS incremental data load package using Change Data Caputure (CDC).

* I would recommend that you have some basic knowledge in SSIS to better understand this article.

Creating OLTP, Staging and OLAP databases and tables

USE master;
GO
 
CREATE DATABASE OLTP;
GO
 
USE OLTP;
 
CREATE TABLE dbo.Customer
(
CustomerId int identity(1,1) primary key,
CustomerName varchar(50)
);
 
INSERT INTO dbo.Customer (CustomerName)
VALUES
('Adam'),
('Bob'),
('Joseph'),
('Roger'),
('Samuel');
 
CREATE DATABASE Staging;
GO
 
USE Staging;
 
CREATE TABLE dbo.Customer_insert
(
CustomerId int,
CustomerName varchar(50)
);
CREATE TABLE dbo.Customer_update
(
CustomerId int,
CustomerName varchar(50)
);
CREATE TABLE dbo.Customer_delete
(
CustomerId int,
CustomerName varchar(50)
);
 
CREATE DATABASE OLAP;
GO
 
USE OLAP;
 
CREATE TABLE dbo.Customer_DW
(
CustomerId int primary key,
CustomerName varchar(50)
);

 

Enabling Change Data Capture (CDC) for the OLTP database and Customer table

  • First we need to enable CDC for the database and then enable it for each table we want to track.
  • SQL Server Agent must be running.
  • The user must have sysadmin privileges to enable CDC for databases.
  • The user must be db_owner to enable CDC for tables.
USE OLTP;
GO
 
EXEC sp_cdc_enable_db;
GO
 
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'Customer',
@role_name = NULL,
@supports_net_changes = 1;
GO



SSIS – Initial Data Load package

Considering that there are no rows in the OLAP database, we need to perform a one-time load to populate it.

The initial load package has three tasks:

  1. A CDC Control Task to mark the initial load start.
  2. A Data Flow Task to transfer the data from the OLTP database to the OLAP database.
  3. A CDC Control Task to mark the initial load end.

The following figures show how to configure the Control Flow and each task

Control Flow:

Control Flow

Control Flow with three tasks

1. CDC Control Task (mark initial load start) properties:

CDC Control Task (mark initial load) properties

CDC Control Task (mark initial load start) properties

2. Data Flow Task (load customer):

Data Flow

The Data Flow tab is composed of two objects: ADO NET Source and ADO NET Destination.

3. CDC Control Task (mark initial load end) properties:

ssis_cdc_control_task_mark_initial_load_end.jpg

CDC Control Task (mark initial load end) properties

Executing the initial load package 

Now the initial load package is ready to be executed.

Let´s check the source and destination tables before executing it:

USE OLTP;
GO
 
SELECT * FROM Customer;
GO
 
USE OLAP;
GO
 
SELECT * FROM Customer_DW;
GO
result_set

Result sets berofe execution

  • Now we need to execute the SSIS initial load package to load data from the Customer table (OLTP) to the Customer_DW (OLAP).
  • Press F5 to execute the package.
Check the results:
result_il_control_flow

Control Flow tab

 Let´s check the tables after the execution:
USE OLTP;
GO
 
SELECT * FROM Customer;
GO
 
USE OLAP;
GO
 
SELECT * FROM Customer_DW;
GO
result_set2

Result sets after execution

The initial data load worked fine! The next step is to perform the incremental data load.

SSIS – Incremetal Data Load package

This package will perform incremental loads of the change data to the OLAP database.

The incremental data load package has five tasks:

  1. An Execute SQL Task to truncate the staging tables.
  2. A CDC Control Task to get the processing range of LSNs (Log Sequence Numbers) containing the changes to be loaded.
  3. A Data Flow Task to load the staging tables.
  4. An Execute SQL Task to load change data from the Staging to the OLAP database.
  5. A CDC Control Task to mark the processed range and define where the next processing range will begin.

The following figures show how to configure the Control Flow and each task:

Control Flow:

incremental_load_control_flow

Control Flow with five tasks

1. Execute SQL Task (clean staging tables) properties:

Task1

Execute SQL Task (clean staging tables) properties

SQL Statement:

TRUNCATE TABLE Customer_delete;
TRUNCATE TABLE Customer_insert;
TRUNCATE TABLE Customer_update;

2. CDC Control Task (get processing range) properties:

Task2.jpg

CDC Control Task (get processing range) properties

3. Data Flow Task (load staging tables):

Task3

Data Flow Task (load staging tables)

4. Execute SQL Task (load customer_dw) properties:

Task4.jpg

Execute SQL Task (load customer_dw) properties

SQL Statement:

INSERT INTO Customer_DW SELECT * FROM Staging..Customer_insert;
DELETE FROM Customer_DW WHERE CustomerId IN (SELECT CustomerId FROM Staging..Customer_delete);
UPDATE c1 SET c1.CustomerName = c2.CustomerName FROM Customer_DW c1 INNER JOIN Staging..Customer_update c2 ON c1.CustomerId = c2.CustomerId;

5. CDC Control Task (mark processed range) properties:

Task5

CDC Control Task (mark processed range) properties

Executing the incremental data load package

Before executing this package, let´s make some data changes in the OLTP database:

USE OLTP;
GO
 
DELETE FROM Customer WHERE CustomerId = 4;
 
INSERT INTO Customer (CustomerName) VALUES ('Martin');
 
UPDATE Customer SET CustomerName = 'Jason' WHERE CustomerId = 1;

 

Now we can execute the following scripts to check that the OLTP and OLAP databases are different:

USE OLTP;
GO
 
SELECT * FROM Customer;
GO
 
USE OLAP;
GO
 
SELECT * FROM Customer_DW;
GO
resultset2.jpg

Result sets before executing the incremental data load. There are some differences between the databases.

  • Now we can execute the SSIS incremental data load package to load data from the Customer table (OLTP) to the Customer_DW (OLAP).
  • Right click the package in the solution explorer and then click “Set as StartUp Object”.
  • Press F5 to execute the package.
Check the results:
result_control_flow

Control Flow tab

result_data_flow

Data Flow tab

Let´s check the tables after the execution:
USE OLTP;
GO
 
SELECT * FROM Customer;
GO
 
USE OLAP;
GO
 
SELECT * FROM Customer_DW;
GO
result_set_after_execution

Result sets after the incremental data load execution

We can see that the tables are synchronized now!

I hope you enjoyed this article.

Please feel free to leave any comments, share and like it!

Thank you very much.

Rafael Juca.

 

SQL Server – Using MERGE to insert, update or delete data

Using the MERGE statement, you can synchronize two tables. In other words, you can insert, update or delete data in a target table based on a source table using a single statement.

EXAMPLE

DECLARE @TennisRackets_Source table(
id int NOT NULL PRIMARY KEY,
name varchar(40) NOT NULL,
price money NOT NULL);
 
DECLARE @TennisRackets_Target table(
id int NOT NULL PRIMARY KEY,
name varchar(40) NOT NULL,
price money NOT NULL);
 
INSERT INTO @TennisRackets_Source VALUES
(1, ‘Head Prestige Pro’, 150),
(2, ‘Head Prestige MP’, 130),
(3, ‘Head Prestige S’, 120),
(4, ‘Head Speed Rev Pro’, 145),
(5, ‘Head Speed MP A’, 135),
(6, ‘Head Speed S’, 125);
 
INSERT INTO @TennisRackets_Target VALUES
(1, ‘Head Prestige Pro’, 150),
(2, ‘Head Prestige MP’, 140),
(3, ‘Head Prestige S’, 120),
(4, ‘Head Speed Rev Pro’, 135),
(7, ‘Head Radical MP’, 110);
COMPARING DATA IN SOURCE AND TARGET TABLES
SELECT * FROM  @TennisRackets_Source;
SELECT * FROM  @TennisRackets_Target;

MERGE_01

  • The rows with ids = 1, 2, 3, 4 are matched (source id = target id). The target rows will be updated with the respective source rows.
  • The rows with ids = 5, 6 are not matched by a target id. These rows will be inserted in the target table.
  • The row with id = 7 is not matched by a source id. This rows will be deleted from the target table.

USING THE MERGE STATEMENT TO SYNCHRONIZE THE TABLES

MERGE INTO @TennisRackets_Target AS TGT
USING @TennisRackets_Source AS SRC
ON SRC.id = TGT.id
WHEN matched THEN UPDATE
   SET TGT.NAME = SRC.NAME,
           TGT.price = SRC.price
WHEN NOT matched THEN INSERT
   VALUES(SRC.id, SRC.NAME, SRC.price)
WHEN NOT matched BY source THEN DELETE;

RESULT SETS AFTER THE EXECUTION

SELECT * FROM  @TennisRackets_Source;
SELECT * FROM  @TennisRackets_Target;
MERGE_02

Observe that the rows are identical after the MERGE statement execution.

Using predicates to avoid unnecessary updates

We can also use predicates to avoid updates when there is no difference between target and source rows.

Look at the following example:

MERGE INTO @TennisRackets_Target AS TGT
USING @TennisRackets_Source AS SRC
ON SRC.id = TGT.id
WHEN matched  AND (SRC.name <> TGT.name OR SRC.price <> TGT.price) THEN UPDATE
   SET TGT.NAME = SRC.NAME,
           TGT.price = SRC.price
WHEN NOT matched THEN INSERT
   VALUES(SRC.id, SRC.NAME, SRC.price)
WHEN NOT matched BY source THEN DELETE;

Now the target table will be updated only when the source id is equal to the target id and one of the nonkey columns (name or price) have different values.

SQL Server – Window Offset Functions (LAG, LEAD, FIRST_VALUE, LAST_VALUE)

SQL Server 2012 has introduced Window offset functions. These functions can be divided into two groups:

  • Functions that allow us to return a value from a row that is in a specific offset from the current row (LAG and LEAD functions). NOTE: The default offset is 1. In other words, the default offset is the previous row when using LAG or the next row when using LEAD.
  • Functions that allow us to return a value from the first or last row in a certain window frame (FIRST_VALUE and LAST_VALUE functions).

EXAMPLES

DECLARE @AnnualSales table (
state CHAR(2),
year varchar(4),
total money);

INSERT INTO @AnnualSales (state,year,total)
VALUES
('CA','2010',20540),
('CA','2011',15700),
('CA','2012',5980),
('CA','2013',12800),
('CA','2014',13600),
('CA','2015',14200),
('NY','2010',7850),
('NY','2011',11830),
('NY','2012',18150),
('NY','2013',21430),
('NY','2014',19250),
('NY','2015',16300);
 
LAG – default arguments
 
SELECT
state,
year,
total,
LAG(total) OVER (PARTITION BY state ORDER BY year) AS 'LAG (year-1)'
FROM @AnnualSales
ORDER BY state, year;
 
window_functions_lag_01

The LAG offset is set to 1 by default. When there is not value for the lag, the default value is NULL.

LAG – defining arguments
 
SELECT
state,
year,
total,
LAG(total,2,0) OVER (PARTITION BY state ORDER BY year) AS 'LAG (year-2)'
FROM @AnnualSales
ORDER BY state, year;
 
 window_functions_lag_02

The LAG offset is set to 2. When there is not value for the lag, NULL is replaced with 0.

LEAD – default arguments
 
SELECT
state,
year,
total,
LEAD(total) OVER (PARTITION BY state ORDER BY year) AS 'LEAD (year+1)'
FROM @AnnualSales
ORDER BY state, year;
 
window_functions_lead_01

The LEAD offset is set to 1 by default. When there is not value for the lead, the default value is NULL.

LEAD – defining arguments
 
SELECT
state,
year,
total,
LEAD(total,2,0) OVER (PARTITION BY state ORDER BY year) AS 'LEAD (year+2)'
FROM @AnnualSales
ORDER BY state, year;
 
window_functions_lead_02

The LEAD offset is set to 2. When there is not value for the lead, NULL is replaced with 0.

FIRST_VALUE and LAST_VALUE

SELECT
state,
year,
total,
FIRST_VALUE(total) OVER (PARTITION BY state ORDER BY year
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS 'FIRST_VALUE',
LAST_VALUE(total) OVER (PARTITION BY state ORDER BY year
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS 'LAST_VALUE'
FROM @AnnualSales
ORDER BY state, year;

window_functions_first_value_last_value

The first value found in CA is 20540,00. The last value found in CA is 14200,00. The first value found in NY is 7850,00. The last value found in NY is 16300,00.

 

SQL Server – Using TRY/CATCH

In the following examples, we will see how to implement error handling in SQL Server using TRY/CATCH

CREATE TABLE dbo.Products(
productid INT CONSTRAINT PK_Products_productid PRIMARY KEY,
productname VARCHAR(30));
 
CREATE TABLE dbo.Orders(
orderid INT CONSTRAINT PK_Orders_orderid PRIMARY KEY,
orderdate DATETIME);
 
CREATE TABLE dbo.OrderDetails(
orderid INT CONSTRAINT FK_OrderDetails_Order FOREIGN KEY REFERENCES dbo.Orders,
productid INT CONSTRAINT FK_OrderDetails_Product FOREIGN KEY REFERENCES dbo.Products);
 
INSERT INTO dbo.Products VALUES
(1, 'Product A'),
(2, 'Product B'),
(3, 'Product C');
 
INSERT INTO dbo.Orders VALUES
(1, CURRENT_TIMESTAMP);
 
INSERT INTO dbo.OrderDetails VALUES
(1, 1),
(1, 2),
(1, 3);
 

Violation of Primary Key

BEGIN TRY
   INSERT INTO dbo.Products VALUES (3, 'Product D');
END TRY
BEGIN CATCH
   SELECT
   'ERROR NUMBER: ' + CAST(ERROR_NUMBER() as varchar(8000)) +
   ' ERROR MESSAGE: ' + ERROR_MESSAGE() AS 'Error description';
END CATCH
ERROR NUMBER: 2627 ERROR MESSAGE: Violation of PRIMARY KEY constraint 'PK_Products_productid'. Cannot insert duplicate key in object 'dbo.Products'. The duplicate key value is (3).

ERROR NUMBER: 2627 ERROR MESSAGE: Violation of PRIMARY KEY constraint 'PK_Products_productid'. Cannot insert duplicate key in object 'dbo.Products'. The duplicate key value is (3).

Violation of Foreign Key

BEGIN TRY
   INSERT INTO dbo.OrderDetails (orderid,productid) VALUES (1,4)
END TRY
BEGIN CATCH
   SELECT
   'ERROR NUMBER: ' + CAST(ERROR_NUMBER() as varchar(8000)) +
   ' ERROR MESSAGE: ' + ERROR_MESSAGE() AS 'Error description';
END CATCH

ERROR NUMBER: 547 ERROR MESSAGE: The INSERT statement conflicted with the FOREIGN KEY constraint “FR_OrderDetails_Product”. The conflict occurred in database “TSQL2012”, table “dbo.Products”, column 'productid'.

Dropping sample tables

DROP TABLE dbo.OrderDetails;
DROP TABLE dbo.Products;
DROP TABLE dbo.Orders;