How to get averages without high and low values with MSSQL
The normal way of calculate averages is very simple with SQL Server, it's to use the AVG() function.
The problem is: how exclude both the lowest and the highest values to reduce the effect of skew from the extreme values.
The answer is: Microsoft SQL Server, like DB2 and Oracle, provides a windowing of the average.
Data used
- USE sql911;
- go
- SELECT emp_short AS 'Employee',
- sales_date AS 'Sold',
- sales_qty AS ' Qty',
- sales_price AS 'Price'
- FROM sales
- ORDER BY emp_short, sales_date;
- go
Employee Sold Qty Price ---------- ----------------------- --------------------------------------- --------------------------------------- 000010 2005-04-06 15:01:01.000 4.00 11.75 000010 2005-11-06 10:01:01.000 9.00 11.75 000010 2005-12-06 11:01:01.000 8.00 11.75 000010 2006-04-06 15:01:01.000 4.00 11.75 000010 2006-05-06 16:01:01.000 3.00 11.75 000010 2006-07-06 19:01:01.000 1.00 11.75 000010 2007-04-06 00:00:00.000 4.00 11.75 000010 2007-05-03 09:00:00.000 3.00 11.75 000010 2007-05-05 04:00:00.000 1.00 11.75 000010 2007-05-07 11:00:00.000 2.00 11.75 000010 2007-05-08 05:00:00.000 1.00 10.00 000010 2007-05-08 07:00:00.000 2.00 10.00 000010 2007-05-08 11:00:00.000 3.00 10.00 000010 2007-05-08 15:00:00.000 1.50 11.75 000010 2007-06-06 17:01:01.000 2.00 11.75 000030 2005-10-06 10:01:01.000 10.00 11.75 000030 2006-01-06 12:01:01.000 7.00 11.75 000030 2006-04-06 15:01:01.000 4.00 11.75 000030 2006-06-06 17:01:01.000 2.00 11.75 000030 2006-10-06 10:01:01.000 10.00 11.75 000030 2007-05-01 00:00:00.000 5.00 11.75 000030 2007-05-07 11:30:00.000 2.00 10.50 000030 2007-05-08 13:30:00.000 1.00 10.00 000030 2007-11-06 10:01:01.000 9.00 11.75 000110 2004-03-06 14:01:01.000 NULL NULL 000110 2005-11-06 10:01:01.000 9.00 11.75 000110 2006-02-06 13:01:01.000 6.00 11.75 000110 2006-03-06 14:01:01.000 5.00 11.75 000110 2006-06-06 17:01:01.000 NULL NULL 000110 2006-10-06 10:01:01.000 10.00 11.75 000110 2006-10-06 10:01:01.000 3.00 11.75 000110 2007-05-09 14:00:00.000 1.25 10.00 000190 2005-03-06 14:01:01.000 5.00 11.75 000190 2005-05-06 16:01:01.000 3.00 11.75 000190 2005-12-06 11:01:01.000 8.00 11.75 000190 2006-02-06 13:01:01.000 NULL NULL 000190 2006-06-06 17:01:01.000 2.00 11.75 000190 2006-07-06 00:00:00.000 1.00 11.75 000190 2006-07-06 19:01:01.000 1.00 11.75 000190 2006-10-06 10:01:01.000 10.00 11.75 000190 2006-10-06 10:01:01.000 10.00 11.75 000190 2007-05-10 14:00:00.000 5.00 10.00 000190 2007-05-11 14:15:00.000 4.00 10.00 000190 2007-05-12 15:00:00.000 2.00 10.00 000190 2007-05-13 16:00:00.000 1.00 10.00 000190 2007-07-06 00:00:00.000 1.00 11.75 000310 2004-10-06 10:01:01.000 NULL NULL 000310 2005-10-06 10:01:01.000 10.00 11.75 000310 2006-01-06 12:01:01.000 7.00 11.75 000310 2006-03-06 14:01:01.000 5.00 11.75 000310 2006-07-06 19:01:01.000 1.00 11.75 000310 2006-10-06 10:01:01.000 10.00 11.75 000310 2006-10-06 10:01:01.000 10.00 11.75 000310 2007-05-07 00:00:00.000 2.00 10.00 000310 2007-05-08 05:00:00.000 1.00 10.00 000310 2007-05-11 13:00:00.000 3.50 10.00 000310 2007-05-14 11:00:00.000 3.00 10.00 (57 row(s) affected)
Average without the low and high values with MSSQL
Including the highest and the lowest values will skew the average, just like ignoring null values will raise the value of average.
This involves 2 things:
- An inline view
- A avg (..) over (partition...)
- SELECT emp_short AS 'Employee',
- AVG(COALESCE(sales_qty,0)) AS 'Avg qty sold',
- MIN(min_qty) AS 'Min qty sold',
- MIN(max_qty) AS 'Max qty sold'
- FROM (
- SELECT emp_short,
- sales_qty,
- MIN(COALESCE(sales_qty,0)) OVER (partition BY emp_short) AS min_qty,
- MAX(COALESCE(sales_qty,0)) OVER (partition BY emp_short) AS max_qty
- FROM sales
- ) abc
- WHERE sales_qty not in (min_qty, max_qty)
- GROUP BY emp_short;
- go
Employee Avg qty sold Min qty sold Max qty sold ---------- --------------------------------------- --------------------------------------- --------------------------------------- 000010 3.318181 1.00 9.00 000030 4.833333 1.00 10.00 000110 4.850000 0.00 10.00 000190 3.000000 0.00 10.00 000310 3.214285 0.00 10.00 (5 row(s) affected)
- min(coalesce(sales_qty,0)) over (partition by emp_short) as min_qty calculates the average quantity for each employee and resets it for the next employee. That's the windowing function of the min.
- max(coalesce(sales_qty,0)) over (partition by emp_short) as max_qty calculates the average quantity for each employee and resets it for the next employee. That's the windowing function of the max.
- The coalesce is because averages like all other summary functions ignore null values. We need to take these null value in consideration. Null means that there was no sale/the sale did not materialize. In the real world, this will affect the averages.
- The excluding of the lowest and the highest value is done by the: where sales_qty not in (min_qty, max_qty)
- It's critical to realize that the windowing functions are applied AFTER the where clause is evaluated.
- The selection criteria MUST NOT be in the inline view, but on the outside query.
Limitations
The drawback of this logic is what happens if an employee has:
- Day 1: Qty: NULL
- Day 2: Qty: NULL
- Day 3: Qty: 1
- Day 4: Qty: 10
- Day 5: Qty: 10
- Day 6: Qty: 10
The simple average is: 5.16
The average, without the highest and lowest values, is: 1

