Union queries are resource hogs
Union queries are one of the most used and abused way of hogging down the SQL Server. Unions can be used either as a DISTINCT, ALL, or an OR unions.
Data used
- USE sql911;
- go
- SELECT EMPNO,PROJNO, EMPTIME AS 'Time Spent',
- EMSTDATE AS 'Start Date', EMENDATE AS 'End Date'
- FROM emp_work
- WHERE projno in ('ad3111','ad3112');
- go
EMPNO PROJNO Time Spent Start Date End Date ------ ------ --------------------------------------- ----------------------- ----------------------- 000230 AD3111 1.00 2002-01-01 00:00:00.000 2002-03-15 00:00:00.000 000230 AD3111 0.50 2002-03-15 00:00:00.000 2002-04-15 00:00:00.000 000230 AD3111 0.50 2002-03-15 00:00:00.000 2002-10-15 00:00:00.000 000230 AD3111 0.50 2002-04-15 00:00:00.000 2002-10-15 00:00:00.000 000230 AD3111 1.00 2002-10-15 00:00:00.000 2003-01-01 00:00:00.000 000240 AD3111 1.00 2002-02-15 00:00:00.000 2002-09-15 00:00:00.000 000240 AD3111 1.00 2002-09-15 00:00:00.000 2003-01-01 00:00:00.000 000250 AD3112 1.00 2002-01-01 00:00:00.000 2002-02-01 00:00:00.000 000250 AD3112 0.50 2002-02-01 00:00:00.000 2002-03-15 00:00:00.000 000250 AD3112 0.50 2004-12-01 00:00:00.000 2005-01-01 00:00:00.000 000250 AD3112 1.00 2003-01-01 00:00:00.000 2003-02-01 00:00:00.000 000250 AD3112 0.50 2006-02-01 00:00:00.000 2006-03-15 00:00:00.000 000250 AD3112 1.00 2002-03-15 00:00:00.000 2002-08-15 00:00:00.000 000250 AD3112 0.25 2005-08-15 00:00:00.000 2005-10-15 00:00:00.000 000250 AD3112 0.25 2002-08-15 00:00:00.000 2002-10-15 00:00:00.000 000250 AD3112 0.50 2002-10-15 00:00:00.000 2002-12-01 00:00:00.000 000250 AD3112 0.50 2002-08-15 00:00:00.000 2003-01-01 00:00:00.000 (17 row(s) affected)
Here is the execution plan for the plain listing:

Execution Plan for the plain query
Union
Here we will get the data for project: AD3111 and AD3112
- SELECT EMPNO,PROJNO, EMPTIME AS 'Time Spent',
- EMSTDATE AS 'Start Date', EMENDATE AS 'End Date'
- FROM emp_work
- WHERE projno = 'ad3111'
- UNION all
- SELECT EMPNO,PROJNO, EMPTIME AS 'Time Spent',
- EMSTDATE AS 'Start Date', EMENDATE AS 'End Date'
- FROM emp_work
- WHERE projno = 'ad3112';
- go
EMPNO PROJNO Time Spent Start Date End Date ------ ------ --------------------------------------- ----------------------- ----------------------- 000230 AD3111 0.50 2002-03-15 00:00:00.000 2002-04-15 00:00:00.000 000230 AD3111 0.50 2002-03-15 00:00:00.000 2002-10-15 00:00:00.000 000230 AD3111 0.50 2002-04-15 00:00:00.000 2002-10-15 00:00:00.000 000230 AD3111 1.00 2002-01-01 00:00:00.000 2002-03-15 00:00:00.000 000230 AD3111 1.00 2002-10-15 00:00:00.000 2003-01-01 00:00:00.000 000240 AD3111 1.00 2002-02-15 00:00:00.000 2002-09-15 00:00:00.000 000240 AD3111 1.00 2002-09-15 00:00:00.000 2003-01-01 00:00:00.000 000250 AD3112 0.25 2002-08-15 00:00:00.000 2002-10-15 00:00:00.000 000250 AD3112 0.25 2005-08-15 00:00:00.000 2005-10-15 00:00:00.000 000250 AD3112 0.50 2002-02-01 00:00:00.000 2002-03-15 00:00:00.000 000250 AD3112 0.50 2002-08-15 00:00:00.000 2003-01-01 00:00:00.000 000250 AD3112 0.50 2002-10-15 00:00:00.000 2002-12-01 00:00:00.000 000250 AD3112 0.50 2004-12-01 00:00:00.000 2005-01-01 00:00:00.000 000250 AD3112 0.50 2006-02-01 00:00:00.000 2006-03-15 00:00:00.000 000250 AD3112 1.00 2002-01-01 00:00:00.000 2002-02-01 00:00:00.000 000250 AD3112 1.00 2002-03-15 00:00:00.000 2002-08-15 00:00:00.000 000250 AD3112 1.00 2003-01-01 00:00:00.000 2003-02-01 00:00:00.000 (17 row(s) affected)
Which is the correct result
Here is the execution plan for the plain union and the details of the sort on the execution plan.

Execution Plan for query with the union

Execution Plan for query with the union with cost for the sort
- The sort is a DISTINCT SORT.
- The sort uses: 63% of the cost.
- Note the estmated I/O
The definition of UNION in SQL is to:
- sort the result.
- execute a SELECT DISTINCT on the result set.
- display the results from the SELECT DISTINCT.
Union All
Here is the same query result but this time with a UNION ALL.
- SELECT EMPNO,PROJNO, EMPTIME AS 'Time Spent',
- EMSTDATE AS 'Start Date', EMENDATE AS 'End Date'
- FROM emp_work
- WHERE projno = 'ad3111'
- UNION all
- SELECT EMPNO,PROJNO, EMPTIME AS 'Time Spent',
- EMSTDATE AS 'Start Date', EMENDATE AS 'End Date'
- FROM emp_work
- WHERE projno = 'ad3112';
- go
EMPNO PROJNO Time Spent Start Date End Date ------ ------ --------------------------------------- ----------------------- ----------------------- 000230 AD3111 1.00 2002-01-01 00:00:00.000 2002-03-15 00:00:00.000 000230 AD3111 0.50 2002-03-15 00:00:00.000 2002-04-15 00:00:00.000 000230 AD3111 0.50 2002-03-15 00:00:00.000 2002-10-15 00:00:00.000 000230 AD3111 0.50 2002-04-15 00:00:00.000 2002-10-15 00:00:00.000 000230 AD3111 1.00 2002-10-15 00:00:00.000 2003-01-01 00:00:00.000 000240 AD3111 1.00 2002-02-15 00:00:00.000 2002-09-15 00:00:00.000 000240 AD3111 1.00 2002-09-15 00:00:00.000 2003-01-01 00:00:00.000 000250 AD3112 1.00 2002-01-01 00:00:00.000 2002-02-01 00:00:00.000 000250 AD3112 0.50 2002-02-01 00:00:00.000 2002-03-15 00:00:00.000 000250 AD3112 0.50 2004-12-01 00:00:00.000 2005-01-01 00:00:00.000 000250 AD3112 1.00 2003-01-01 00:00:00.000 2003-02-01 00:00:00.000 000250 AD3112 0.50 2006-02-01 00:00:00.000 2006-03-15 00:00:00.000 000250 AD3112 1.00 2002-03-15 00:00:00.000 2002-08-15 00:00:00.000 000250 AD3112 0.25 2005-08-15 00:00:00.000 2005-10-15 00:00:00.000 000250 AD3112 0.25 2002-08-15 00:00:00.000 2002-10-15 00:00:00.000 000250 AD3112 0.50 2002-10-15 00:00:00.000 2002-12-01 00:00:00.000 000250 AD3112 0.50 2002-08-15 00:00:00.000 2003-01-01 00:00:00.000 (17 row(s) affected)
And here is the execution with the I/O Costs.

Execution Plan for query with the union all

Execution Plan for query with the union all with cost
- The UNION does imply a select distinct.
- The UNION ALL does NOT use a select distinct.
- When you are sure that there will be no duplicate, do NOT use the union, but a union all.
- The UNION is often used when older section of data have been archived into separate tables (in the hope of improving speed), therefore you are sure that the data is not a duplicate one.
Please note that there is some cheating in the costs:
- I am using small tables.
- I query the same table twice.
- SQL Server does optimize by realizing that it's the same table, therefore reread the data from RAM.

