Finding duplicate rows with MS SQL Server
MS SQL Server does not have a built-in function for finding duplicate rows, so the standard way of doing this, is through counting.
Applies to:
- Microsoft SQL Server 7
- Microsoft SQL Server 2000
- Microsoft SQL Server 2005
- Microsoft SQL Server 2008
Data used
- SELECT sales_date AS 'Date', sales_qty AS 'Qty'
- FROM sales;
Date Qty ----------------------- --------------------------------------- 2005-10-06 10:01:01.000 10.00 2006-10-06 10:01:01.000 10.00 2005-11-06 10:01:01.000 9.00 2005-12-06 11:01:01.000 8.00 2006-01-06 12:01:01.000 7.00 2006-02-06 13:01:01.000 6.00 2006-03-06 14:01:01.000 5.00 2006-04-06 15:01:01.000 4.00 2006-05-06 16:01:01.000 3.00 2006-06-06 17:01:01.000 2.00 2006-07-06 19:01:01.000 1.00 2004-10-06 10:01:01.000 10.00 2005-11-06 10:01:01.000 9.00 2005-12-06 11:01:01.000 8.00 2006-01-06 12:01:01.000 7.00 2006-02-06 13:01:01.000 6.00 2005-03-06 14:01:01.000 5.00 2006-04-06 15:01:01.000 4.00 2006-05-06 16:01:01.000 3.00 2007-06-06 17:01:01.000 2.00 2006-07-06 19:01:01.000 1.00 2005-10-06 10:01:01.000 10.00 2005-11-06 10:01:01.000 9.00 2005-12-06 11:01:01.000 8.00 2006-01-06 12:01:01.000 7.00 2004-02-06 13:01:01.000 6.00 2004-03-06 14:01:01.000 5.00 2005-04-06 15:01:01.000 4.00 2005-05-06 16:01:01.000 3.00 2006-06-06 17:01:01.000 2.00 2006-07-06 19:01:01.000 1.00 2006-03-06 14:01:01.000 5.00 2006-04-06 15:01:01.000 4.00 2006-05-06 16:01:01.000 3.00 2006-06-06 17:01:01.000 2.00 2006-07-06 19:01:01.000 1.00 2005-10-06 10:01:01.000 10.00 2005-11-06 10:01:01.000 9.00 2005-12-06 11:01:01.000 8.00 2006-01-06 12:01:01.000 7.00 2006-05-06 16:01:01.000 3.00 2006-06-06 17:01:01.000 2.00 2006-07-06 19:01:01.000 1.00 2005-10-06 10:01:01.000 10.00 2007-11-06 10:01:01.000 9.00 (45 row(s) affected)
Duplicate sales
Find all sales that have the same quantity for the same date. They are very likely duplicate of each others.
- SELECT sales_date AS 'Date', sales_qty AS 'Qty', COUNT(*) AS 'Num'
- FROM sales
- GROUP BY sales_date, sales_qty;
Date Qty Num ----------------------- --------------------------------------- ----------- 2006-07-06 19:01:01.000 1.00 5 2006-06-06 17:01:01.000 2.00 4 2007-06-06 17:01:01.000 2.00 1 2005-05-06 16:01:01.000 3.00 1 2006-05-06 16:01:01.000 3.00 4 2005-04-06 15:01:01.000 4.00 1 2006-04-06 15:01:01.000 4.00 3 2004-03-06 14:01:01.000 5.00 1 2005-03-06 14:01:01.000 5.00 1 2006-03-06 14:01:01.000 5.00 2 2004-02-06 13:01:01.000 6.00 1 2006-02-06 13:01:01.000 6.00 2 2006-01-06 12:01:01.000 7.00 4 2005-12-06 11:01:01.000 8.00 4 2005-11-06 10:01:01.000 9.00 4 2007-11-06 10:01:01.000 9.00 1 2004-10-06 10:01:01.000 10.00 1 2005-10-06 10:01:01.000 10.00 4 2006-10-06 10:01:01.000 10.00 1 (19 row(s) affected)
- This shows all sales, including both the duplicate ones, and the one that are not duplicate.
- Total 19 rows.
Only duplicate sales
- SELECT sales_date AS 'Date', sales_qty AS 'Qty', COUNT(*) AS 'Num'
- FROM sales
- GROUP BY sales_date, sales_qty
- HAVING (COUNT(*) > 1);
Date Qty Num ----------------------- --------------------------------------- ----------- 2006-07-06 19:01:01.000 1.00 5 2006-06-06 17:01:01.000 2.00 4 2006-05-06 16:01:01.000 3.00 4 2006-04-06 15:01:01.000 4.00 3 2006-03-06 14:01:01.000 5.00 2 2006-02-06 13:01:01.000 6.00 2 2006-01-06 12:01:01.000 7.00 4 2005-12-06 11:01:01.000 8.00 4 2005-11-06 10:01:01.000 9.00 4 2005-10-06 10:01:01.000 10.00 4 (10 row(s) affected)
- Note that this time, there are only 10 rows that are potential duplicates.
- These are potential duplicates. Why because the logic of what make a duplicate sale is flawed. Same date, same quantity does not guarantee that the sale is a duplicate. More info is needed.













