How to get the number of duplicate rows I have in a table.

Answer

  1. Get the duplicates
  2. Count them

Data used

  1. USE sql911;
  2. go
  3.  
  4. SELECT sales_date, sales_price, emp_short
  5. FROM sales
  6. ORDER BY sales_date;
  7. go
sales_date              sales_price                             emp_short
----------------------- --------------------------------------- ----------
2004-03-06 14:01:01.000 11.75                                   000110
2004-10-06 10:01:01.000 11.75                                   000310
2005-03-06 14:01:01.000 11.75                                   000190
2005-04-06 15:01:01.000 11.75                                   000010
2005-05-06 16:01:01.000 11.75                                   000190
2005-10-06 10:01:01.000 11.75                                   000030
2005-10-06 10:01:01.000 11.75                                   000310
2005-11-06 10:01:01.000 11.75                                   000110
2005-11-06 10:01:01.000 11.75                                   000010
2005-12-06 11:01:01.000 11.75                                   000010
2005-12-06 11:01:01.000 11.75                                   000190
2006-01-06 12:01:01.000 11.75                                   000030
2006-01-06 12:01:01.000 11.75                                   000310
2006-02-06 13:01:01.000 11.75                                   000110
2006-02-06 13:01:01.000 11.75                                   000190
2006-03-06 14:01:01.000 11.75                                   000110
2006-03-06 14:01:01.000 11.75                                   000310
2006-04-06 15:01:01.000 11.75                                   000010
2006-04-06 15:01:01.000 11.75                                   000030
2006-05-06 16:01:01.000 11.75                                   000010
2006-06-06 17:01:01.000 11.75                                   000190
2006-06-06 17:01:01.000 11.75                                   000110
2006-06-06 17:01:01.000 11.75                                   000030
2006-07-06 00:00:00.000 11.75                                   000190
2006-07-06 19:01:01.000 11.75                                   000310
2006-07-06 19:01:01.000 11.75                                   000190
2006-07-06 19:01:01.000 11.75                                   000010
2006-10-06 10:01:01.000 11.75                                   000110
2006-10-06 10:01:01.000 11.75                                   000110
2006-10-06 10:01:01.000 11.75                                   000190
2006-10-06 10:01:01.000 11.75                                   000190
2006-10-06 10:01:01.000 11.75                                   000310
2006-10-06 10:01:01.000 11.75                                   000310
2006-10-06 10:01:01.000 11.75                                   000030
2007-04-06 00:00:00.000 11.75                                   000010
2007-05-01 00:00:00.000 11.75                                   000030
2007-06-06 17:01:01.000 11.75                                   000010
2007-07-06 00:00:00.000 11.75                                   000190
2007-11-06 10:01:01.000 11.75                                   000030

(39 row(s) affected)

What are the duplicate rows

  • Duplicate rows are rows that have the same data and exist more than once. !!Duh!! But it's true.
  • See Finding duplicate rows with MS SQL Server for detailed explanation on the logic behind finding duplicate rows.
  1. SELECT sales_date, sales_price, emp_short
  2. FROM sales
  3. GROUP BY sales_date, sales_price, emp_short
  4. HAVING COUNT(*) > 1
  5. go
sales_date              sales_price                             emp_short
----------------------- --------------------------------------- ----------
2006-10-06 10:01:01.000 11.75                                   000110
2006-10-06 10:01:01.000 11.75                                   000190
2006-10-06 10:01:01.000 11.75                                   000310

(3 row(s) affected)
  • We can see that we have 3 different set of duplicates. By just eye balling [since it's such a small sample] and by reading the last line: (3 row(s) affected).

Finding just how many rows are duplicates

  1. SELECT 'Duplicate sets: ' + CAST(COUNT(*) AS VARCHAR(8)) AS 'Number of Duplicates'
  2. FROM (
  3.   SELECT sales_date, sales_price, emp_short
  4.   FROM sales
  5.   GROUP BY sales_date, sales_price, emp_short
  6.   HAVING COUNT(*)>1
  7. ) AS ABC
Number of Duplicates
------------------------
Duplicate sets: 3

(1 row(s) affected)

We need to use an inline view because of how SQL Server evaluates the SELECT statement:

  1. FROM (LeftTable)
  2. ON (condition)
  3. (join type) JOIN (RightTable)
  4. WHERE (condition)
  5. GROUP BY (list)
  6. WITH (CUBE | ROLLUP)
  7. HAVING (condition)
  8. SELECT
  9. DISTINCT
  10. ORDER BY (list)
  11. TOP (list)