Finding duplicate rows with MS SQL Server

MS SQL Server does not have a built-in function for finding duplicate rows, so the standard way of doing this, is through counting.

Applies to:

  • Microsoft SQL Server 7
  • Microsoft SQL Server 2000
  • Microsoft SQL Server 2005
  • Microsoft SQL Server 2008

Data used

  1. SELECT sales_date AS 'Date', sales_qty AS 'Qty'
  2. FROM sales;
Date                    Qty
----------------------- ---------------------------------------
2005-10-06 10:01:01.000 10.00
2006-10-06 10:01:01.000 10.00
2005-11-06 10:01:01.000 9.00
2005-12-06 11:01:01.000 8.00
2006-01-06 12:01:01.000 7.00
2006-02-06 13:01:01.000 6.00
2006-03-06 14:01:01.000 5.00
2006-04-06 15:01:01.000 4.00
2006-05-06 16:01:01.000 3.00
2006-06-06 17:01:01.000 2.00
2006-07-06 19:01:01.000 1.00
2004-10-06 10:01:01.000 10.00
2005-11-06 10:01:01.000 9.00
2005-12-06 11:01:01.000 8.00
2006-01-06 12:01:01.000 7.00
2006-02-06 13:01:01.000 6.00
2005-03-06 14:01:01.000 5.00
2006-04-06 15:01:01.000 4.00
2006-05-06 16:01:01.000 3.00
2007-06-06 17:01:01.000 2.00
2006-07-06 19:01:01.000 1.00
2005-10-06 10:01:01.000 10.00
2005-11-06 10:01:01.000 9.00
2005-12-06 11:01:01.000 8.00
2006-01-06 12:01:01.000 7.00
2004-02-06 13:01:01.000 6.00
2004-03-06 14:01:01.000 5.00
2005-04-06 15:01:01.000 4.00
2005-05-06 16:01:01.000 3.00
2006-06-06 17:01:01.000 2.00
2006-07-06 19:01:01.000 1.00
2006-03-06 14:01:01.000 5.00
2006-04-06 15:01:01.000 4.00
2006-05-06 16:01:01.000 3.00
2006-06-06 17:01:01.000 2.00
2006-07-06 19:01:01.000 1.00
2005-10-06 10:01:01.000 10.00
2005-11-06 10:01:01.000 9.00
2005-12-06 11:01:01.000 8.00
2006-01-06 12:01:01.000 7.00
2006-05-06 16:01:01.000 3.00
2006-06-06 17:01:01.000 2.00
2006-07-06 19:01:01.000 1.00
2005-10-06 10:01:01.000 10.00
2007-11-06 10:01:01.000 9.00

(45 row(s) affected)

Duplicate sales

Find all sales that have the same quantity for the same date. They are very likely duplicate of each others.

  1. SELECT sales_date AS 'Date', sales_qty AS 'Qty', COUNT(*) AS 'Num'
  2. FROM sales
  3. GROUP BY sales_date, sales_qty;
Date                    Qty                                     Num
----------------------- --------------------------------------- -----------
2006-07-06 19:01:01.000 1.00                                    5
2006-06-06 17:01:01.000 2.00                                    4
2007-06-06 17:01:01.000 2.00                                    1
2005-05-06 16:01:01.000 3.00                                    1
2006-05-06 16:01:01.000 3.00                                    4
2005-04-06 15:01:01.000 4.00                                    1
2006-04-06 15:01:01.000 4.00                                    3
2004-03-06 14:01:01.000 5.00                                    1
2005-03-06 14:01:01.000 5.00                                    1
2006-03-06 14:01:01.000 5.00                                    2
2004-02-06 13:01:01.000 6.00                                    1
2006-02-06 13:01:01.000 6.00                                    2
2006-01-06 12:01:01.000 7.00                                    4
2005-12-06 11:01:01.000 8.00                                    4
2005-11-06 10:01:01.000 9.00                                    4
2007-11-06 10:01:01.000 9.00                                    1
2004-10-06 10:01:01.000 10.00                                   1
2005-10-06 10:01:01.000 10.00                                   4
2006-10-06 10:01:01.000 10.00                                   1

(19 row(s) affected)
  • This shows all sales, including both the duplicate ones, and the one that are not duplicate.
  • Total 19 rows.

Only duplicate sales

  1. SELECT sales_date AS 'Date', sales_qty AS 'Qty', COUNT(*) AS 'Num'
  2. FROM sales
  3. GROUP BY sales_date, sales_qty
  4. HAVING (COUNT(*) > 1);
Date                    Qty                                     Num
----------------------- --------------------------------------- -----------
2006-07-06 19:01:01.000 1.00                                    5
2006-06-06 17:01:01.000 2.00                                    4
2006-05-06 16:01:01.000 3.00                                    4
2006-04-06 15:01:01.000 4.00                                    3
2006-03-06 14:01:01.000 5.00                                    2
2006-02-06 13:01:01.000 6.00                                    2
2006-01-06 12:01:01.000 7.00                                    4
2005-12-06 11:01:01.000 8.00                                    4
2005-11-06 10:01:01.000 9.00                                    4
2005-10-06 10:01:01.000 10.00                                   4

(10 row(s) affected)
  • Note that this time, there are only 10 rows that are potential duplicates.
  • These are potential duplicates. Why because the logic of what make a duplicate sale is flawed. Same date, same quantity does not guarantee that the sale is a duplicate. More info is needed.