How to get averages without high and low values with MSSQL

The normal way of calculate averages is very simple with SQL Server, it's to use the AVG() function.

The problem is: how exclude both the lowest and the highest values to reduce the effect of skew from the extreme values.
The answer is: Microsoft SQL Server, like DB2 and Oracle, provides a windowing of the average.

Data used

  1. USE sql911;
  2. go
  3.  
  4. SELECT emp_short AS 'Employee',
  5.        sales_date AS 'Sold',
  6.        sales_qty AS ' Qty',
  7.        sales_price AS 'Price'
  8. FROM sales
  9. ORDER BY emp_short, sales_date;
  10. go
Employee   Sold                     Qty                                    Price
---------- ----------------------- --------------------------------------- ---------------------------------------
000010     2005-04-06 15:01:01.000 4.00                                    11.75
000010     2005-11-06 10:01:01.000 9.00                                    11.75
000010     2005-12-06 11:01:01.000 8.00                                    11.75
000010     2006-04-06 15:01:01.000 4.00                                    11.75
000010     2006-05-06 16:01:01.000 3.00                                    11.75
000010     2006-07-06 19:01:01.000 1.00                                    11.75
000010     2007-04-06 00:00:00.000 4.00                                    11.75
000010     2007-05-03 09:00:00.000 3.00                                    11.75
000010     2007-05-05 04:00:00.000 1.00                                    11.75
000010     2007-05-07 11:00:00.000 2.00                                    11.75
000010     2007-05-08 05:00:00.000 1.00                                    10.00
000010     2007-05-08 07:00:00.000 2.00                                    10.00
000010     2007-05-08 11:00:00.000 3.00                                    10.00
000010     2007-05-08 15:00:00.000 1.50                                    11.75
000010     2007-06-06 17:01:01.000 2.00                                    11.75
000030     2005-10-06 10:01:01.000 10.00                                   11.75
000030     2006-01-06 12:01:01.000 7.00                                    11.75
000030     2006-04-06 15:01:01.000 4.00                                    11.75
000030     2006-06-06 17:01:01.000 2.00                                    11.75
000030     2006-10-06 10:01:01.000 10.00                                   11.75
000030     2007-05-01 00:00:00.000 5.00                                    11.75
000030     2007-05-07 11:30:00.000 2.00                                    10.50
000030     2007-05-08 13:30:00.000 1.00                                    10.00
000030     2007-11-06 10:01:01.000 9.00                                    11.75
000110     2004-03-06 14:01:01.000 NULL                                    NULL
000110     2005-11-06 10:01:01.000 9.00                                    11.75
000110     2006-02-06 13:01:01.000 6.00                                    11.75
000110     2006-03-06 14:01:01.000 5.00                                    11.75
000110     2006-06-06 17:01:01.000 NULL                                    NULL
000110     2006-10-06 10:01:01.000 10.00                                   11.75
000110     2006-10-06 10:01:01.000 3.00                                    11.75
000110     2007-05-09 14:00:00.000 1.25                                    10.00
000190     2005-03-06 14:01:01.000 5.00                                    11.75
000190     2005-05-06 16:01:01.000 3.00                                    11.75
000190     2005-12-06 11:01:01.000 8.00                                    11.75
000190     2006-02-06 13:01:01.000 NULL                                    NULL
000190     2006-06-06 17:01:01.000 2.00                                    11.75
000190     2006-07-06 00:00:00.000 1.00                                    11.75
000190     2006-07-06 19:01:01.000 1.00                                    11.75
000190     2006-10-06 10:01:01.000 10.00                                   11.75
000190     2006-10-06 10:01:01.000 10.00                                   11.75
000190     2007-05-10 14:00:00.000 5.00                                    10.00
000190     2007-05-11 14:15:00.000 4.00                                    10.00
000190     2007-05-12 15:00:00.000 2.00                                    10.00
000190     2007-05-13 16:00:00.000 1.00                                    10.00
000190     2007-07-06 00:00:00.000 1.00                                    11.75
000310     2004-10-06 10:01:01.000 NULL                                    NULL
000310     2005-10-06 10:01:01.000 10.00                                   11.75
000310     2006-01-06 12:01:01.000 7.00                                    11.75
000310     2006-03-06 14:01:01.000 5.00                                    11.75
000310     2006-07-06 19:01:01.000 1.00                                    11.75
000310     2006-10-06 10:01:01.000 10.00                                   11.75
000310     2006-10-06 10:01:01.000 10.00                                   11.75
000310     2007-05-07 00:00:00.000 2.00                                    10.00
000310     2007-05-08 05:00:00.000 1.00                                    10.00
000310     2007-05-11 13:00:00.000 3.50                                    10.00
000310     2007-05-14 11:00:00.000 3.00                                    10.00

(57 row(s) affected)

Average without the low and high values with MSSQL

Including the highest and the lowest values will skew the average, just like ignoring null values will raise the value of average.

This involves 2 things:

  1. An inline view
  2. A avg (..) over (partition...)
  1. SELECT emp_short AS 'Employee',
  2.        AVG(COALESCE(sales_qty,0)) AS 'Avg qty sold',
  3.        MIN(min_qty) AS 'Min qty sold',
  4.        MIN(max_qty) AS 'Max qty sold'
  5. FROM (
  6.    SELECT emp_short,
  7.           sales_qty,
  8.           MIN(COALESCE(sales_qty,0)) OVER (partition BY emp_short) AS min_qty,
  9.           MAX(COALESCE(sales_qty,0)) OVER (partition BY emp_short) AS max_qty
  10.    FROM sales
  11.    ) abc
  12. WHERE sales_qty not in (min_qty, max_qty)
  13. GROUP BY emp_short;
  14. go
Employee   Avg qty sold                            Min qty sold                            Max qty sold
---------- --------------------------------------- --------------------------------------- ---------------------------------------
000010     3.318181                                1.00                                    9.00
000030     4.833333                                1.00                                    10.00
000110     4.850000                                0.00                                    10.00
000190     3.000000                                0.00                                    10.00
000310     3.214285                                0.00                                    10.00

(5 row(s) affected)
  • min(coalesce(sales_qty,0)) over (partition by emp_short) as min_qty calculates the average quantity for each employee and resets it for the next employee. That's the windowing function of the min.
  • max(coalesce(sales_qty,0)) over (partition by emp_short) as max_qty calculates the average quantity for each employee and resets it for the next employee. That's the windowing function of the max.
  • The coalesce is because averages like all other summary functions ignore null values. We need to take these null value in consideration. Null means that there was no sale/the sale did not materialize. In the real world, this will affect the averages.
  • The excluding of the lowest and the highest value is done by the: where sales_qty not in (min_qty, max_qty)
  • It's critical to realize that the windowing functions are applied AFTER the where clause is evaluated.
  • The selection criteria MUST NOT be in the inline view, but on the outside query.

Limitations

The drawback of this logic is what happens if an employee has:

  1. Day 1: Qty: NULL
  2. Day 2: Qty: NULL
  3. Day 3: Qty: 1
  4. Day 4: Qty: 10
  5. Day 5: Qty: 10
  6. Day 6: Qty: 10

The simple average is: 5.16
The average, without the highest and lowest values, is: 1