<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Michi's blog &#187; R</title>
	<atom:link href="http://blog.mikael.johanssons.org/archive/category/computer/programming/r/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.mikael.johanssons.org</link>
	<description>Because my LiveJournal is too silly</description>
	<lastBuildDate>Sat, 12 Nov 2011 15:09:36 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Mapping zipcodes in R</title>
		<link>http://blog.mikael.johanssons.org/archive/2009/05/mapping-zipcodes-in-r/</link>
		<comments>http://blog.mikael.johanssons.org/archive/2009/05/mapping-zipcodes-in-r/#comments</comments>
		<pubDate>Wed, 13 May 2009 14:51:51 +0000</pubDate>
		<dc:creator>Michi</dc:creator>
				<category><![CDATA[Computer]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://blog.mikael.johanssons.org/?p=209</guid>
		<description><![CDATA[I started fiddling around with R again, and ended up playing with a zipcode database. So, first I downloaded the zipcode database at Mapping Hacks, and unpacked the zipfile in my working directory. Then, I loaded the data into R &#62; zips &#60;- read.table(&#34;zipcode.csv&#34;,sep=&#34;,&#34;,quote=&#34;\&#34;&#34;,header=TRUE) &#62; names(zips) [1] &#34;zip&#34; &#160; &#160; &#160; &#34;city&#34; &#160; &#160; &#160;&#34;state&#34; [...]]]></description>
			<content:encoded><![CDATA[<p>I started fiddling around with R again, and ended up playing with a zipcode database.</p>
<p>So, first I downloaded the zipcode database at <a href="http://www.mappinghacks.com/data">Mapping Hacks</a>, and unpacked the zipfile in my working directory. </p>
<p>Then, I loaded the data into R</p>
<div class="dean_ch" style="white-space: wrap;">
&gt; zips &lt;- read.table(&quot;zipcode.csv&quot;,sep=&quot;,&quot;,quote=&quot;\&quot;&quot;,header=TRUE)<br />
&gt; names(zips)<br />
[1] &quot;zip&quot; &nbsp; &nbsp; &nbsp; &quot;city&quot; &nbsp; &nbsp; &nbsp;&quot;state&quot; &nbsp; &nbsp; &quot;latitude&quot; &nbsp;&quot;longitude&quot;<br />
[6] &quot;timezone&quot; &nbsp;&quot;dst&quot; &nbsp; &nbsp; &nbsp;<br />
&nbsp;</div>
<p>So, now I have an R frame containing a lot of US cities, their geographical coordinates, and their zip codes. So we can start playing with the plot command! After rooting around a bit, I ended up settling on the smallest footprint plot dot I could make R produce, by setting the option <span lang="R">pch=20</span> in the plot options. Hence, I ended up with a command basically like this:</p>
<div class="dean_ch" style="white-space: wrap;">
&gt; plot(zips$longitude,zips$latitude,type=&quot;p&quot;,col=((zips$zip/10000)%%10)+1,pch=20,axes=FALSE,xlab=&quot;&quot;,ylab=&quot;&quot;,cex=0.1)<br />
&nbsp;</div>
<p>where the +1 after the modulus is to make even 0-values plot, and the cex parameter sets the point size to something small and pretty.<br />
<a href="http://blog.mikael.johanssons.org/wp-content/usZip1.png"><img src="http://blog.mikael.johanssons.org/wp-content/usZip1.png" alt="First digit of the USPS zip code" width="400" /></a></p>
<p>We can continue this, tweaking the divisor to extract all the other digits of the zip code, and we end up getting:</p>
<div class="dean_ch" style="white-space: wrap;">
&gt; plot(zips$longitude,zips$latitude,type=&quot;p&quot;,col=((zips$zip/1000)%%10)+1,pch=20,axes=FALSE,xlab=&quot;&quot;,ylab=&quot;&quot;,cex=0.1)<br />
&nbsp;</div>
<p>and the result<br />
<a href="http://blog.mikael.johanssons.org/wp-content/usZip2.png"><img src="http://blog.mikael.johanssons.org/wp-content/usZip2.png" alt="Second digit of the USPS zip code" width="400" /></a></p>
<div class="dean_ch" style="white-space: wrap;">
&gt; plot(zips$longitude,zips$latitude,type=&quot;p&quot;,col=((zips$zip/100)%%10)+1,pch=20,axes=FALSE,xlab=&quot;&quot;,ylab=&quot;&quot;,cex=0.1)<br />
&nbsp;</div>
<p>and the result<br />
<a href="http://blog.mikael.johanssons.org/wp-content/usZip3.png"><img src="http://blog.mikael.johanssons.org/wp-content/usZip3.png" alt="Third digit of the USPS zip code" width="400" /></a></p>
<div class="dean_ch" style="white-space: wrap;">
&gt; plot(zips$longitude,zips$latitude,type=&quot;p&quot;,col=((zips$zip/10)%%10)+1,pch=20,axes=FALSE,xlab=&quot;&quot;,ylab=&quot;&quot;,cex=0.1)<br />
&nbsp;</div>
<p>and the result<br />
<a href="http://blog.mikael.johanssons.org/wp-content/usZip4.png"><img src="http://blog.mikael.johanssons.org/wp-content/usZip4.png" alt="Fourth digit of the USPS zip code" width="400" /></a></p>
<p>and finally</p>
<div class="dean_ch" style="white-space: wrap;">
&gt; plot(zips$longitude,zips$latitude,type=&quot;p&quot;,col=((zips$zip/1)%%10)+1,pch=20,axes=FALSE,xlab=&quot;&quot;,ylab=&quot;&quot;,cex=0.1)<br />
&nbsp;</div>
<p>and the result<br />
<a href="http://blog.mikael.johanssons.org/wp-content/usZip5.png"><img src="http://blog.mikael.johanssons.org/wp-content/usZip5.png" alt="Fifth digit of the USPS zip code" width="400" /></a></p>
<p>And then, of course, we can zoom in on data too. So we can do things like extracting Californian zip codes</p>
<div class="dean_ch" style="white-space: wrap;">
&gt; cazips &lt;- zips[zips$state == &quot;CA&quot;,]<br />
&gt; plot(cazips$longitude,cazips$latitude,type=&quot;p&quot;,col=((cazips$zip/1000)%%10)+1,pch=20,axes=FALSE,xlab=&quot;&quot;,ylab=&quot;&quot;,cex=0.5)<br />
&nbsp;</div>
<p>to get<br />
<a href="http://blog.mikael.johanssons.org/wp-content/caZip2.png"><img src="http://blog.mikael.johanssons.org/wp-content/caZip2.png" alt="Second digit of the Californian USPS zip code" width="400" /></a><br />
or, we could extract New York zip codes:</p>
<div class="dean_ch" style="white-space: wrap;">
&gt; nyzips &lt;- zips[zips$state == &quot;NY&quot;,]<br />
&gt; plot(nyzips$longitude,nyzips$latitude,type=&quot;p&quot;,col=((nyzips$zip/100)%%10)+1,pch=20,axes=FALSE,xlab=&quot;&quot;,ylab=&quot;&quot;,cex=0.5)<br />
&nbsp;</div>
<p><a href="http://blog.mikael.johanssons.org/wp-content/nyZip3.png"><img src="http://blog.mikael.johanssons.org/wp-content/nyZip3.png" alt="Third digit of the New York USPS zip code" width="400" /></a><br />
or even extract, say, the zip codes starting with 10 or 11, covering New York City and surroundings and take a closer look</p>
<div class="dean_ch" style="white-space: wrap;">
&gt; ny10zips &lt;- nyzips[nyzips$zip&lt;12000,]<br />
&gt; ny10zips &lt;- ny10zips[ny10zips$zip&gt;9999,]<br />
&gt; plot(ny10zips$longitude,ny10zips$latitude,type=&quot;p&quot;,col=((ny10zips$zip/100)%%10)+1,pch=20,axes=FALSE,xlab=&quot;&quot;,ylab=&quot;&quot;,cex=1.0)<br />
&nbsp;</div>
<p><a href="http://blog.mikael.johanssons.org/wp-content/ny10Zip3.png"><img src="http://blog.mikael.johanssons.org/wp-content/ny10Zip3.png" alt="Third digit of the USPS zip codes 10xxx and 11xxx" width="400" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mikael.johanssons.org/archive/2009/05/mapping-zipcodes-in-r/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>R and topological data analysis</title>
		<link>http://blog.mikael.johanssons.org/archive/2008/08/r-and-topological-data-analysis/</link>
		<comments>http://blog.mikael.johanssons.org/archive/2008/08/r-and-topological-data-analysis/#comments</comments>
		<pubDate>Sat, 23 Aug 2008 15:38:00 +0000</pubDate>
		<dc:creator>Michi</dc:creator>
				<category><![CDATA[Homology and Homotopy]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Topology]]></category>

		<guid isPermaLink="false">http://blog.mikael.johanssons.org/?p=179</guid>
		<description><![CDATA[This is extremely early playing around. It touches on things I&#8217;m going to be working with in Stanford, but at this point, I&#8217;m not even up on toy level. We&#8217;ll start by generating a dataset. Essentially, I&#8217;ll take the trefolium, sample points on the curve, and then perturb each point ever so slightly. idx &#60;- [...]]]></description>
			<content:encoded><![CDATA[<p>This is extremely early playing around. It touches on things I&#8217;m going to be working with in Stanford, but at this point, I&#8217;m not even up on toy level.</p>
<p>We&#8217;ll start by generating a dataset. Essentially, I&#8217;ll take the trefolium, sample points on the curve, and then perturb each point ever so slightly.</p>
<div class="dean_ch" style="white-space: wrap;">
idx &lt;- 1:2000<br />
theta &lt;- idx*2*pi/2000<br />
a &lt;- cos(3*theta)<br />
x &lt;- a*cos(theta)<br />
y &lt;- a*sin(theta)<br />
xper &lt;- rnorm(2000)<br />
yper &lt;- rnorm<br />
xd &lt;- x + xper/100<br />
yd &lt;- y + yper/100<br />
cd &lt;- cbind(xd,yd)<br />
&nbsp;</div>
<p>As a result, we get a dataset that looks like this:<br />
<img src="http://blog.mikael.johanssons.org/wp-content/uploads/2008/08/trifolium_data.png" alt="Trifolium data" /></p>
<p>So, let&#8217;s pick a sample from the dataset. What I&#8217;d really want to do now would be to do the witness complex construction, but I haven&#8217;t figured enough out about how R ticks to do quite that. So we&#8217;ll pick a sample and then build the 1-skeleton of the Rips-Vietoris complex using Euclidean distance between points. This means, we&#8217;ll draw a graph on the dataset with an edge between two sample points whenever they are within &epsilon; from each other.</p>
<p>So we pick a sample from this sample. Every 31 points might be good. (number arrived at by guessing wildly, and drawing the resulting images until they looked pretty enough)</p>
<div class="dean_ch" style="white-space: wrap;">
csamp &lt;- cd[seq(1,dim(csamp)[1],31),]<br />
&nbsp;</div>
<p>We&#8217;d get, from this, the following result:<br />
<img src="http://blog.mikael.johanssons.org/wp-content/uploads/2008/08/trifolium_sample.png" alt="Trifolium sample" /></p>
<p>Now, we&#8217;ll want to build the corresponding skeleton. Let&#8217;s do it for a few different &epsilon;s to demonstrate the difference.</p>
<div class="dean_ch" style="white-space: wrap;">
d &lt;- function(x,y,z,w) { sqrt((x-z)^2+(y-w)^2) }<br />
par(mfrow=c(2,2))<br />
eps &lt;- c(0.05,0.1,0.15,0.2)<br />
cols &lt;- c(&quot;cyan&quot;,&quot;green&quot;,&quot;yellow&quot;,&quot;red&quot;)<br />
for (ei in 1:length(eps)) {<br />
&nbsp; plot(cd,col=&quot;gray&quot;)<br />
&nbsp; points(csamp,col=&quot;blue&quot;)<br />
&nbsp; title(eps[ei])<br />
&nbsp; for (i in 1:dim(csamp)[1]) {<br />
&nbsp; &nbsp; for (j in 1:dim(csamp)[1]) {<br />
&nbsp; &nbsp; &nbsp; x &lt;- csamp[i,1]; y &lt;- csamp[i,2]<br />
&nbsp; &nbsp; &nbsp; z &lt;- csamp[j,1]; w &lt;- csamp[j,2]<br />
&nbsp; &nbsp; &nbsp; e &lt;- eps[ei]<br />
&nbsp; &nbsp; &nbsp; if(d(x,y,z,w) &lt; 2*e) {<br />
&nbsp; &nbsp; &nbsp; &nbsp; segments(x,y,z,w,col=cols[ei])<br />
&nbsp; &nbsp; &nbsp; } else {<br />
&nbsp; &nbsp; &nbsp; }<br />
&nbsp; &nbsp; }<br />
&nbsp; }<br />
}<br />
&nbsp;</div>
<p>The result is:<br />
<img src="http://blog.mikael.johanssons.org/wp-content/uploads/2008/08/trifolium_skeleta.png" alt="Trifolium skeleta" /></p>
<p>We notice that as the radius we observe grows, we connect all the loops, but by the time the loops are completely connected, there are also cross connections forming toward the middle. However, with any luck, these cross connections will be so short-lived, in terms of the radii we use, so that the homology classes we extract from the Rips-Vietoris complexes are noticably more persistent.</p>
<p>Doing the corresponding computation relies on me figuring enough out to use the Plex software suite, or writing my own, and thus will be subject of a much later blog post. This one was mainly &#8220;Look &#8211; I use R&#8221; and &#8220;Look &#8211; pretty pictures&#8221;. Enjoy.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mikael.johanssons.org/archive/2008/08/r-and-topological-data-analysis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

