Monday, September 2, 2013

The donkey was me

Following up on yesterday's analysis...

I showed a series of graphs yesterday, and the only condition for my choice of data was that "the numbers [must be] up pretty high, not down near zero." The reason I imposed that condition was to make obvious the result I got in the fourth graph of the series -- to show that the incorrect sequence of operations magnifies the big number, making it ridiculously large and obviously an error.

It worked out well, for that series of graphs.

For today's post I thought I'd take Lars's spreadsheet, the one with my extra columns in it, and leave his calculations and mine untouched, but replace one of his data series with one that has the numbers up pretty high, not down near zero. I thought the result would show a similar ridiculous, obvious error. That's not what happened.

I used the same Capacity Utilization numbers as in yesterday'a graphs. But what with subtracting the average value and whatever else he does in his calc, both Lars's calc and my version put the "processed" Capacity Utilization numbers in about the same place our Market Indicator numbers were in yesterday's Graph #5. In the right place, or nearly so.

So here's the story, then: I am really pleased that I figured out the four-step sequence:
1. Subtract the series average from the Series A data.
2. Divide the resulting data by the Series A standard deviation.
3. Multiply the resulting data by the Series B standard deviation. And
4. To the resulting data add the Series B average.
I think this is a magnificently clever way to make two datasets visually comparable. And I have Lars to thank for it.

But that's all I got from his spreadsheet. Whatever else he did escapes me. Apparently it's not a mistake. It's just some complex calculation that I couldn't disentangle.


Jazzbumpa said...

Remember, Cap Ut is a series of always positive numbers. As it happens you can see a clear trend.

Lars is looking at YoY % Change. For any time series that is going to oscillate near 0. As such, it's already partially detrended. Subtracting the data set average completes the detrending. [Though I'm not sure how effective that would be if the YoY data set had a strong pos or neg bias. For some data sets you might need a rolling average of some duration.]

Dividing by the Std Dev normalizes the data into St Dev units. This does two things. It commonizes data sets with a) different spread tendencies, and b) different absolute magnitudes.

So it puts things on a common basis.

I had to do this to wrap my head around it.


The Arthurian said...

Nice. Thanks, Jazz.

For some reason I just really got into looking at that spreadsheet.