Syntaxfree writes over at his blog about a silly little toy he wrote, using the PFP library, to generate random text.
Now, his text is unreadable. I mean, it’s even unpronounceable. Why? Because he’s looking at bigram distributions of letter.
Great, I thought, I’ll do him one better. Random text using bigram distributions on words must surely be a LOT better than random text using bigram distributions on letters. At least the words come out readable, and they may even come out in a decent order.
So I sat down with his code, and hacked, tweaked, and monadized it to this
import Probability
import Data.Char
import Control.Monad
filename="kjv.genesis"
bigram t = zip ws (tail ws) where
ws = (words . map toLower . filter (\x -> isAlpha x || isSpace x)) t
distro = uniform . bigram
goal = readFile filename
goalD = fmap distro goal
one = do
gD <- goalD
(a,b) <- pick gD
return [a,b]
many n = fmap unwords $ fmap concat $ sequence $ take n $ repeat one
So, I have some corpus — I just pulled the King James Genesis to have some sort of body of text to work it, and saved it into the kjv.genesis file. Then, I can pop over to my beloved GHCi and execute
The first execution will take a while, since it has to, y’know, digest the actual text, calculate distributions, and set everything up.
Subsequent executions also take quite some time, and I’m not at all certain why. An explanation would be nice, if someone has it.
And for some sample poetry, I give you
house and it to circumcised all cool of these things daughters from ye go dreamed for stead and in unto god of for wives done to i give god shall bowed himself and shem tower of small and said lord he was called his the days these are thither therefore cainan and and he rachel and hear my sons born
when isaac dry land unto him isaac and have i children struggled lord hath had eaten goods which morning was ye shall by the all the me this naphtali and years old was her to see of the his brother out of names after they feed thy mothers of ephron god said he put him into after his the tent
unto us i will to him i will the presence hands to his right name was possession of days journey down in that thou perizzites and and he they for god made that is of the twentys sake jacob so itself after thou and of anah years and in isaac pillar of and the builded a of canaan noah and
4 People had this to say...
When I was a lad we didn’t have this Internets thing so I had to type in the text myself when I wrote my first Markov chain generator. Instead of the Bible I used this book which I typed in in its entirety. The output made about the same amount of sense however.
William S. Burroughs would have loved this. He did it by hand, with pages of text and ribbons of reel-to-reel audio tape and scissors.
Or he would have felt it obsolete. Random text is no fun when there’s no challenge to it.
//JJ
how long did it take to generate the text in subsequent runs?
Want your say?