What do these depict?


Here are two others. Different data source, different point in time, but what are they?


They are all linked in pairs – one coloured and one black linked together. They are not sports related. And they are taken from real world data. The colours are relevant and constitute a hint in their own right.
19 People had this to say...
Someone dicking around with PCA
Great idea for a puzzle!
I’ve no idea that they’re plots of. Gives us a clue.
Ok, very willing to guess but could we get some tips/clues at least?
@PCA: Well, yes. It’s one first step in a project I have to compare dimension reduction algorithms on datasets from one specific domain.
@dysfunctor: The coloration in the first plot will help somewhat.
@Cetin: more clues will come as time goes by.
We’re in baseball season – well these are from baseball.
My guess at the first plot: blue indicate the location a ball stopped after a strike or a ball. Red indicates the final location (or first bounce) of balls that got a run or someone on-base.
Second plot: if “south” on the plot is home plate, then this is a plot of where the ball was when someone was declared out. This explains the large # of points near 1st, 2nd, and 3rd base. Also if someone is tagged out then that explains the number of points between 1st 2nd and 3rd bases.
Good guess, Steve I, but wrong domain.
Next hint: the first diagram is a dimension reduction of 442 vectors in an 1187-dimensional space, and the second diagram reduces 1187 vectors in a 442-dimensional space.
From the colouring it looks like political parties to me.
The 1st pair might be members of congress and votes held?
The 2nd shows the right mix of colours for the parties in UK parliment.
And that brings us to the right subject!
Voting records form a vector, once we choose numerical values for the votes. Done right, each MP or representative is a point in some very high dimensional vector space – so we can use dimension reduction methods to get a geometrical view of the politics.
In US, it is extremely linear. The left-right axis of the two dominant parties is almost the only feature relevant in the point set. In fact, the vertical axis in the red/blue plot is about two orders of magnitude less relevant than the horizontal axis.
Now, though, what are the black plots?
Sweden?
John: No, though I’m working on getting hold of Swedish data. A Swedish plot would have been appropriately colorized.
Remember I mentioned that the black plots in some sense belonged paired up with the party plots.
France!
Dimitry: Nope.
The black plots are not merely New Countries, they are qualitatively different.
The black plots are some kind of fourier transform of the coloured plots? But I’m having trouble imagining what that might mean visually.
Malcolm: I don’t actually know enough about fourier transforms to tell whether this is the case. There is however a sense of duality to the black points viz. the coloured ones.
Is it the views of their constituents?
Nope.
It is the corresponding plots for the issues in the vector space spanned by the MPs.
Hi Michi, I am unable to read the full article of your post http://blog.mikael.johanssons.org/archive/2009/05/1-manifolds-and-curves/. The figure and the remaining of the article from the figure do not show up. Can you take a look or send me a copy of the full article to my email? Thanks!
Hi Michi,
This is really interesting. Can you tell me where/how you got your source data? Also, if you have time, a little more information on what you’re doing with the data would be great!
Thanks
Chris (@crntaylor on twitter)
The data for these was downloaded from The Public Whip for the UK data, and embarrassingly, I don’t remember where I grabbed the Canadian data – but I think it might have been off of How’d they vote.
The Canadian parliament now has they’re data up online, queryable at least.
I’ve a running idea about writing a paper, comparing different analysis techniques on political data – PCA vs. non-linear methods such as LLMAP or Laplacian eigenmaps. Not much is happening with the project right now – but that’s why I try to aggregate and analyze voting data.
Want your say?