They don't spell it out in hyperstat online statistics textbook.

I see the gap point lies in a number from the median disappearing that doesn't necessarily disappear from the mean, due to subtracting the median number from itself when caluclating either the absoluste value or square deviation, and then the difference squaring the remaining values makes (I guess it results in the mean numbers being minimized in the sum of SQUARED deviations).

I don't intuit yet why that gap keeps the absolute value sum deviations minimized for the median but not the square values. Is there some further equation or proof that spells this out (I think there must be). It would make sense if there are fewer numbers in a set highly deviant from the mean than the median, but I don't see why that would be the case.

Like I said, I'd be making stupid posts in the process of trying to understand this stuff.

"

# Deviations from the Mean and Median

The numbers 1, 2, 3, 7, 8, 9, 12 have a mean of 6 and median of 7.

**The mean minimizes sum of squared deviations. **

The sum of the squared deviations from the mean is:

(1-6)² + (2-6)² + (3-6)² + (7-6)² + (8-6)² + (9-6)² + (12-6)² = 25 +16 + 9 + 1 + 4 + 9 + 36 = 100.

From the median:

(1-7)² + (2-7)² + (3-7)² + (7-7)² + (8-7)² + (9-7)² + (12-7)² = 36 + 25 + 16 + 0 + 1 + 4 + 25 = 107.

**The median minimizes sum of absolute deviations.**

The sum of the absolute values of the deviations from the mean are:

| 5 + 4 + 3 + 1 + 2 + 3 + 6 = 24.

From the median:

6 + 5 + 4 + 0 + 1 + 2 + 5 = 23."

Here's my intuition:

The mean more heavily weights the outliers. Keeping the other data constant, moving the furthest outlier even further won't change the median but will change the mean. Squaring differences also more heavily weights outliers, that's how quadratic growth works.

You could come up with a formula for the deviations for some number other than the mean. If the mean is m the other number is (m+d) where d can be positive or negative. If you sum up the squared deviations I expect you'll end up with a d^2 term, which has a minimum where d=0. I'll have to break out the pencil & paper later to actually work that out.

Now instead let's say the median is m and we're thinking about absolute (not squared) deviations. There are n numbers to the left and n to the right of the median. The deviations for (m+d) would be (n*d) for one side minus (n*d) for the other (they cancel out), plus d if the set size is odd and the median is in the set. That's for marginal changes, if d gets large enough that it moves past another number then you no longer have an equal number (n) on both sides.

Posted by: TGGP | August 14, 2009 at 04:37 PM

The sum of squared deviations from sum number M can be expressed as:

Sum(i=1,N,(Xi-M)^2) =

Sum(i=1,N,Xi^2-2MXi+M^2) =

NM^2 - 2NM*Sum(i=1,N,Xi) + Sum(i=1,N,Xi^2)

This is a quadratic equation for M, the variable we are concerned with. We take the derivative and solve for zero.

2NM - 2N*Sum(i=1,n,Xi) = 0

M = Sum(i=1,n,Xi) / N

So to minimize the sum of squared differences, M must be the arithmetic mean.

Posted by: TGGP | August 14, 2009 at 10:38 PM

Still working sporadically to get an intuitive grasp, but my time is shorter today. I'll post commentary when I do (or at least have had time to think through it again).

Posted by: Hopefully Anonymous | August 14, 2009 at 10:52 PM

Not there yet but an interesting related reading on the mechanics of "best fitting" a line to a(x, y) data set.

Posted by: Hopefully Anonymous | August 14, 2009 at 11:24 PM

More intuitive?

Sum of the differences:

(a - x)^2 + (b - x)^2 + (c - x)^2 + …

Min when

0 = 2(a – x).-1 + 2(b – x).-1 + 2(c – x).-1 +…

:. 0 = a – x + b – x + c – x …

:. x + x + x + … = a + b + c + …

:. x = (a + b + c + …) / N

Posted by: Katja Grace | August 16, 2009 at 08:18 PM

Katja,

Thanks, I'll take a look when I get a chance.

Posted by: Hopefully Anonymous | August 16, 2009 at 09:01 PM