Skip to Content »

Michi’s blog » More silly random text

 More silly random text

  • February 9th, 2007
  • 5:34 pm

Syntaxfree writes over at his blog about a silly little toy he wrote, using the PFP library, to generate random text.

Now, his text is unreadable. I mean, it’s even unpronounceable. Why? Because he’s looking at bigram distributions of letter.

Great, I thought, I’ll do him one better. Random text using bigram distributions on words must surely be a LOT better than random text using bigram distributions on letters. At least the words come out readable, and they may even come out in a decent order.

So I sat down with his code, and hacked, tweaked, and monadized it to this

module Test where

import Probability
import Data.Char
import Control.Monad

filename="kjv.genesis"

bigram t = zip ws (tail ws) where
  ws = (words . map toLower . filter (\x -> isAlpha x || isSpace x)) t

distro = uniform . bigram

goal = readFile filename

goalD = fmap distro goal

one = do
  gD <- goalD
  (a,b) <- pick gD
  return [a,b]

many n = fmap unwords $ fmap concat $ sequence $ take n $ repeat one
 

So, I have some corpus — I just pulled the King James Genesis to have some sort of body of text to work it, and saved it into the kjv.genesis file. Then, I can pop over to my beloved GHCi and execute

Prelude> :l Test
[1 of 4] Compiling ListUtils        ( ListUtils.hs, interpreted )
[2 of 4] Compiling Show             ( Show.hs, interpreted )
[3 of 4] Compiling Probability      ( Probability.hs, interpreted )
[4 of 4] Compiling Test             ( Test.hs, interpreted )
Ok, modules loaded: Show, Test, Probability, ListUtils.
*Test> many 30
Loading package haskell98 … linking … done.
"house and it to circumcised all cool of these things daughters from ye go dreamed for stead and in unto god of for wives done to i give god shall bowed himself and shem tower of small and said lord he was called his the days these are thither therefore cainan and and he rachel and hear my sons born"
 

The first execution will take a while, since it has to, y’know, digest the actual text, calculate distributions, and set everything up.

Subsequent executions also take quite some time, and I’m not at all certain why. An explanation would be nice, if someone has it.

And for some sample poetry, I give you

house and it to circumcised all cool of these things daughters from ye go dreamed for stead and in unto god of for wives done to i give god shall bowed himself and shem tower of small and said lord he was called his the days these are thither therefore cainan and and he rachel and hear my sons born

when isaac dry land unto him isaac and have i children struggled lord hath had eaten goods which morning was ye shall by the all the me this naphtali and years old was her to see of the his brother out of names after they feed thy mothers of ephron god said he put him into after his the tent

unto us i will to him i will the presence hands to his right name was possession of days journey down in that thou perizzites and and he they for god made that is of the twentys sake jacob so itself after thou and of anah years and in isaac pillar of and the builded a of canaan noah and

4 People had this to say...

Gravatar
  • Dan P
  • February 9th, 2007
  • 22:22

When I was a lad we didn’t have this Internets thing so I had to type in the text myself when I wrote my first Markov chain generator. Instead of the Bible I used this book which I typed in in its entirety. The output made about the same amount of sense however.

Gravatar
  • Moira
  • February 28th, 2007
  • 18:57

William S. Burroughs would have loved this. He did it by hand, with pages of text and ribbons of reel-to-reel audio tape and scissors.

Gravatar
  • Johan
  • March 9th, 2007
  • 21:35

Or he would have felt it obsolete. Random text is no fun when there’s no challenge to it.

//JJ

Gravatar
  • kris
  • June 1st, 2007
  • 2:53

how long did it take to generate the text in subsequent runs?

Want your say?

* Required fields. Your e-mail address will not be published on this site

You can use the following XHTML tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>