« I'm on measurement scales (2 of 6) from chap 1 Intro of Hyperstat | Main | Much more than three on health policy »

August 14, 2009

Comments

TGGP

Here's my intuition:
The mean more heavily weights the outliers. Keeping the other data constant, moving the furthest outlier even further won't change the median but will change the mean. Squaring differences also more heavily weights outliers, that's how quadratic growth works.
You could come up with a formula for the deviations for some number other than the mean. If the mean is m the other number is (m+d) where d can be positive or negative. If you sum up the squared deviations I expect you'll end up with a d^2 term, which has a minimum where d=0. I'll have to break out the pencil & paper later to actually work that out.

Now instead let's say the median is m and we're thinking about absolute (not squared) deviations. There are n numbers to the left and n to the right of the median. The deviations for (m+d) would be (n*d) for one side minus (n*d) for the other (they cancel out), plus d if the set size is odd and the median is in the set. That's for marginal changes, if d gets large enough that it moves past another number then you no longer have an equal number (n) on both sides.

TGGP

The sum of squared deviations from sum number M can be expressed as:
Sum(i=1,N,(Xi-M)^2) =
Sum(i=1,N,Xi^2-2MXi+M^2) =
NM^2 - 2NM*Sum(i=1,N,Xi) + Sum(i=1,N,Xi^2)

This is a quadratic equation for M, the variable we are concerned with. We take the derivative and solve for zero.

2NM - 2N*Sum(i=1,n,Xi) = 0
M = Sum(i=1,n,Xi) / N
So to minimize the sum of squared differences, M must be the arithmetic mean.

Hopefully Anonymous

Still working sporadically to get an intuitive grasp, but my time is shorter today. I'll post commentary when I do (or at least have had time to think through it again).

Hopefully Anonymous

Not there yet but an interesting related reading on the mechanics of "best fitting" a line to a(x, y) data set.

Katja Grace

More intuitive?

Sum of the differences:
(a - x)^2 + (b - x)^2 + (c - x)^2 + …

Min when
0 = 2(a – x).-1 + 2(b – x).-1 + 2(c – x).-1 +…
:. 0 = a – x + b – x + c – x …
:. x + x + x + … = a + b + c + …
:. x = (a + b + c + …) / N

Hopefully Anonymous

Katja,
Thanks, I'll take a look when I get a chance.

The comments to this entry are closed.