How to get the number of duplicate rows I have in a table.
Answer
- Get the duplicates
- Count them
Data used
- USE sql911;
- go
- SELECT sales_date, sales_price, emp_short
- FROM sales
- ORDER BY sales_date;
- go
sales_date sales_price emp_short ----------------------- --------------------------------------- ---------- 2004-03-06 14:01:01.000 11.75 000110 2004-10-06 10:01:01.000 11.75 000310 2005-03-06 14:01:01.000 11.75 000190 2005-04-06 15:01:01.000 11.75 000010 2005-05-06 16:01:01.000 11.75 000190 2005-10-06 10:01:01.000 11.75 000030 2005-10-06 10:01:01.000 11.75 000310 2005-11-06 10:01:01.000 11.75 000110 2005-11-06 10:01:01.000 11.75 000010 2005-12-06 11:01:01.000 11.75 000010 2005-12-06 11:01:01.000 11.75 000190 2006-01-06 12:01:01.000 11.75 000030 2006-01-06 12:01:01.000 11.75 000310 2006-02-06 13:01:01.000 11.75 000110 2006-02-06 13:01:01.000 11.75 000190 2006-03-06 14:01:01.000 11.75 000110 2006-03-06 14:01:01.000 11.75 000310 2006-04-06 15:01:01.000 11.75 000010 2006-04-06 15:01:01.000 11.75 000030 2006-05-06 16:01:01.000 11.75 000010 2006-06-06 17:01:01.000 11.75 000190 2006-06-06 17:01:01.000 11.75 000110 2006-06-06 17:01:01.000 11.75 000030 2006-07-06 00:00:00.000 11.75 000190 2006-07-06 19:01:01.000 11.75 000310 2006-07-06 19:01:01.000 11.75 000190 2006-07-06 19:01:01.000 11.75 000010 2006-10-06 10:01:01.000 11.75 000110 2006-10-06 10:01:01.000 11.75 000110 2006-10-06 10:01:01.000 11.75 000190 2006-10-06 10:01:01.000 11.75 000190 2006-10-06 10:01:01.000 11.75 000310 2006-10-06 10:01:01.000 11.75 000310 2006-10-06 10:01:01.000 11.75 000030 2007-04-06 00:00:00.000 11.75 000010 2007-05-01 00:00:00.000 11.75 000030 2007-06-06 17:01:01.000 11.75 000010 2007-07-06 00:00:00.000 11.75 000190 2007-11-06 10:01:01.000 11.75 000030 (39 row(s) affected)
What are the duplicate rows
- Duplicate rows are rows that have the same data and exist more than once. !!Duh!! But it's true.
- See Finding duplicate rows with MS SQL Server for detailed explanation on the logic behind finding duplicate rows.
- SELECT sales_date, sales_price, emp_short
- FROM sales
- GROUP BY sales_date, sales_price, emp_short
- HAVING COUNT(*) > 1
- go
sales_date sales_price emp_short ----------------------- --------------------------------------- ---------- 2006-10-06 10:01:01.000 11.75 000110 2006-10-06 10:01:01.000 11.75 000190 2006-10-06 10:01:01.000 11.75 000310 (3 row(s) affected)
- We can see that we have 3 different set of duplicates. By just eye balling [since it's such a small sample] and by reading the last line: (3 row(s) affected).
Finding just how many rows are duplicates
- SELECT 'Duplicate sets: ' + CAST(COUNT(*) AS VARCHAR(8)) AS 'Number of Duplicates'
- FROM (
- SELECT sales_date, sales_price, emp_short
- FROM sales
- GROUP BY sales_date, sales_price, emp_short
- HAVING COUNT(*)>1
- ) AS ABC
Number of Duplicates ------------------------ Duplicate sets: 3 (1 row(s) affected)
We need to use an inline view because of how SQL Server evaluates the SELECT statement:
- FROM (LeftTable)
- ON (condition)
- (join type) JOIN (RightTable)
- WHERE (condition)
- GROUP BY (list)
- WITH (CUBE | ROLLUP)
- HAVING (condition)
- SELECT
- DISTINCT
- ORDER BY (list)
- TOP (list)
- See: In which order does SQL Server process SQL statements for further explanations on how MS SQL Server processes a SELECT statement.
- See: Inline views with MS SQL Server for how to use inline views.

