How to delete duplicate rows in batch with MS SQL Server

  • Duplicates usually occurs because of wrong logic or from a bug in the implementation.
  • Often when you have duplicate, you will have hundreds, and thousands.
  • It's not always possible to delete manually the duplicate rows.

To delete manually duplicate rows see: How to manually delete duplicate rows

Data used

use sql911;
go

select emp_short, 
       sales_date, 
       sales_qty as 'Sold'
from sales
order by emp_short, sales_date;
go
emp_short  sales_date              Sold
---------- ----------------------- ----------
000010     2005-04-06 15:01:01.000 4.00
000010     2005-11-06 10:01:01.000 9.00
000010     2005-12-06 11:01:01.000 8.00
000010     2005-12-06 11:01:01.000 8.00
000010     2005-12-06 11:01:01.000 8.00
000010     2006-04-06 15:01:01.000 4.00
000010     2006-05-06 16:01:01.000 3.00
000010     2006-05-06 16:01:01.000 3.00
000010     2006-07-06 19:01:01.000 1.00
000010     2007-04-06 00:00:00.000 4.00
000010     2007-04-06 00:00:00.000 4.00
000010     2007-04-06 00:00:00.000 4.00
000010     2007-06-06 17:01:01.000 2.00
000030     2004-02-06 13:01:01.000 6.00
000030     2005-10-06 10:01:01.000 10.00
000030     2006-01-06 12:01:01.000 7.00
000030     2006-04-06 15:01:01.000 4.00
000030     2006-04-06 15:01:01.000 4.00
000030     2006-06-06 17:01:01.000 2.00
000030     2007-05-01 00:00:00.000 5.00
000030     2007-05-01 00:00:00.000 5.00
000030     2007-05-01 00:00:00.000 5.00
000030     2007-05-01 00:00:00.000 5.00
000030     2007-05-01 00:00:00.000 5.00
000030     2007-05-01 00:00:00.000 5.00
000030     2007-05-01 00:00:00.000 5.00
000030     2007-11-06 10:01:01.000 9.00
000110     2004-03-06 14:01:01.000 5.00
000110     2005-11-06 10:01:01.000 9.00
000110     2006-02-06 13:01:01.000 6.00
000110     2006-03-06 14:01:01.000 5.00
000110     2006-05-06 16:01:01.000 3.00
000110     2006-06-06 17:01:01.000 2.00
000110     2006-07-06 19:01:01.000 1.00
000110     2006-07-06 19:01:01.000 1.00
000190     2005-03-06 14:01:01.000 5.00
000190     2005-05-06 16:01:01.000 3.00
000190     2005-12-06 11:01:01.000 8.00
000190     2006-01-06 12:01:01.000 7.00
000190     2006-01-06 12:01:01.000 7.00
000190     2006-02-06 13:01:01.000 6.00
000190     2006-05-06 16:01:01.000 3.00
000190     2006-06-06 17:01:01.000 2.00
000190     2006-06-06 17:01:01.000 2.00
000190     2006-07-06 00:00:00.000 1.00
000190     2006-07-06 19:01:01.000 1.00
000190     2007-07-06 00:00:00.000 1.00
000190     2007-07-06 00:00:00.000 1.00
000190     2007-07-06 00:00:00.000 1.00
000310     2004-10-06 10:01:01.000 10.00
000310     2005-10-06 10:01:01.000 10.00
000310     2005-10-06 10:01:01.000 10.00
000310     2005-10-06 10:01:01.000 10.00
000310     2005-11-06 10:01:01.000 9.00
000310     2005-11-06 10:01:01.000 9.00
000310     2006-01-06 12:01:01.000 7.00
000310     2006-03-06 14:01:01.000 5.00
000310     2006-07-06 19:01:01.000 1.00
000310     2006-10-06 10:01:01.000 10.00

(59 row(s) affected)

Finding the duplicate rows

  • The duplicate rows occur more than once.
  • It's kind of self-evident, but that's how we find the duplicates.

We want to find the duplicate sales. What makes a duplicate sale?

  • The same rep
  • The same date
  • The same quantity
use sql911;
go

select emp_short, 
       sales_date, 
       sum(sales_qty) as 'Totals', 
       count(emp_short) as 'Duplicates'
from sales
group by emp_short, sales_date, sales_qty
having count(emp_short) > 1
order by emp_short;
go
emp_short  sales_date              Totals      Duplicates
---------- ----------------------- ----------- -----------
000010     2005-12-06 11:01:01.000 24.00       3
000010     2006-05-06 16:01:01.000 6.00        2
000010     2007-04-06 00:00:00.000 12.00       3
000030     2006-04-06 15:01:01.000 8.00        2
000030     2007-05-01 00:00:00.000 35.00       7
000110     2006-07-06 19:01:01.000 2.00        2
000190     2006-01-06 12:01:01.000 14.00       2
000190     2006-06-06 17:01:01.000 4.00        2
000190     2007-07-06 00:00:00.000 3.00        3
000310     2005-10-06 10:01:01.000 30.00       3
000310     2005-11-06 10:01:01.000 18.00       2

(11 row(s) affected)
  • You must group by all the columns that make the duplicate.

Deleting the duplicate rows in batch

Copy the duplicate key values into a temporary table

select emp_short, sales_date, sales_qty, count(*)
into #tempsaleskey
from sales
group by emp_short, sales_date, sales_qty
having count(*) > 1;
go
(11 row(s) affected)

Copy the duplicate rows into a temporary table

This is the step that will eliminated all the duplicate rows.

select distinct s1.*
into #tempsalesdata
from sales s1, #tempsaleskey t1
where s1.emp_short = t1.emp_short and 
      s1.sales_date = t1.sales_date and
      s1.sales_qty = t1.sales_qty;
go
(11 row(s) affected)

Verify that you have correct data

  • Optional, but as Ronald Reagan said: Trust but verify.
select emp_short, sales_date, sales_qty, count(*) as 'Counter'
from #tempsalesdata
group by emp_short, sales_date, sales_qty;
go
emp_short  sales_date              sales_qty    Counter
---------- ----------------------- ------------ -----------
000010     2005-12-06 11:01:01.000 8.00         1
000010     2006-05-06 16:01:01.000 3.00         1
000010     2007-04-06 00:00:00.000 4.00         1
000030     2006-04-06 15:01:01.000 4.00         1
000030     2007-05-01 00:00:00.000 5.00         1
000110     2006-07-06 19:01:01.000 1.00         1
000190     2006-01-06 12:01:01.000 7.00         1
000190     2006-06-06 17:01:01.000 2.00         1
000190     2007-07-06 00:00:00.000 1.00         1
000310     2005-10-06 10:01:01.000 10.00        1
000310     2005-11-06 10:01:01.000 9.00         1

(11 row(s) affected)

Delete the duplicates from the original table

DELETE sales
FROM sales s1, #tempsaleskey t1
WHERE s1.emp_short = t1.emp_short and
      s1.sales_date = t1.sales_date and
      s1.sales_qty = t1.sales_qty;
go
(31 row(s) affected)

Put the unique rows back in the original table

INSERT sales SELECT * FROM #tempsalesdata;
go
(11 row(s) affected)

Verification

SELECT emp_short, 
       sales_date, 
       sales_qty AS 'Sold'
FROM sales
ORDER BY emp_short, sales_date;
go
emp_short  sales_date              Sold
---------- ----------------------- ---------------------------------------
000010     2005-04-06 15:01:01.000 4.00
000010     2005-11-06 10:01:01.000 9.00
000010     2005-12-06 11:01:01.000 8.00
000010     2006-04-06 15:01:01.000 4.00
000010     2006-05-06 16:01:01.000 3.00
000010     2006-07-06 19:01:01.000 1.00
000010     2007-04-06 00:00:00.000 4.00
000010     2007-06-06 17:01:01.000 2.00
000030     2004-02-06 13:01:01.000 6.00
000030     2005-10-06 10:01:01.000 10.00
000030     2006-01-06 12:01:01.000 7.00
000030     2006-04-06 15:01:01.000 4.00
000030     2006-06-06 17:01:01.000 2.00
000030     2007-05-01 00:00:00.000 5.00
000030     2007-11-06 10:01:01.000 9.00
000110     2004-03-06 14:01:01.000 5.00
000110     2005-11-06 10:01:01.000 9.00
000110     2006-02-06 13:01:01.000 6.00
000110     2006-03-06 14:01:01.000 5.00
000110     2006-05-06 16:01:01.000 3.00
000110     2006-06-06 17:01:01.000 2.00
000110     2006-07-06 19:01:01.000 1.00
000190     2005-03-06 14:01:01.000 5.00
000190     2005-05-06 16:01:01.000 3.00
000190     2005-12-06 11:01:01.000 8.00
000190     2006-01-06 12:01:01.000 7.00
000190     2006-02-06 13:01:01.000 6.00
000190     2006-05-06 16:01:01.000 3.00
000190     2006-06-06 17:01:01.000 2.00
000190     2006-07-06 00:00:00.000 1.00
000190     2006-07-06 19:01:01.000 1.00
000190     2007-07-06 00:00:00.000 1.00
000310     2004-10-06 10:01:01.000 10.00
000310     2005-10-06 10:01:01.000 10.00
000310     2005-11-06 10:01:01.000 9.00
000310     2006-01-06 12:01:01.000 7.00
000310     2006-03-06 14:01:01.000 5.00
000310     2006-07-06 19:01:01.000 1.00
000310     2006-10-06 10:01:01.000 10.00

(39 row(s) affected)