I will be speaking on my research into topological data analysis for political data sets at the University of Edinburgh, at 4:10pm on 15th October, in Room 6206, James Clerk Maxwell Building.
Data Analysis on politics data
Data analysis has played a growing role in politics for many years now; analyzing polling data to predict outcomes of elections is perhaps the most well-known application.
A different approach that has gotten more and more traction lately is to analyze the voting behaviour of elected representatives as a way to understand the inner workings of parliaments, and to monitor the elected representatives to make sure they behave as they once promised. Sites like GovTrack and VoteView bring machine learning and data analysis tools to the citizens, and illustrate and visualize the groupings and behaviour in political administration.
One key step to make parliamentary data accessible for data analysis is to recognize that the data set is inherently geometric; the sequence of votes cast in a parliamentary session can be seen as coordinates for a very high-dimensional vector — say +1 for Yea, -1 for Nay, and 0 otherwise. This way, each member of parliament is represented by a vector; and members who agree with each other will be close to each other in the resulting vector space.
We can use dimension reduction techniques on these vectors to find essential structures within parliaments. At a first approach, this tends to uncover the party structure of parliament:
This image shows the US House of Representatives, laid out in 2d by PCA coordinates from their voting records. The span from Democrats (blue) to Republicans (red) is clearly seen.
This plot shows the UK House of Commons during the period 2001-2005, again with 2d positions generated by a PCA dimension reduction on the voting data. Colours are for political parties, Labor (red), Tory (blue), and LibDem (yellow) the dominating parts, and smaller parties in other colours.
Topological Data Analysis
Topological Data Analysis is a brand new field of research, where algebraic and combinatorial topology is used to refine geometric methods. Topology encodes «closeness» and provides a robust approach to qualitative features in data sets.
In particular, for the political data, we are using a technique called Mapper. This technique, invented in the Applied Topology group at Stanford, produces easy to analyze topological models induced by point clouds: the points are divided using measurement functions, and then clustered into closely connected groups. Whenever groups overlap, they are connected, producing a simplified model for the space suggested by the data points.
Applying this on topological data, we can discover sub-groups in the US House of Representatives that were not visible by a PCA dimension reduction. Not only that, but we can measure political unrest by the fragmentation of parliament into larger or smaller interest groups.
Mapper analyses of House of Representatives voting records, using Iris from Ayasdi.