More silly random text

Syntaxfree writes over at his blog about a silly little toy he wrote, using the PFP library, to generate random text.

Now, his text is unreadable. I mean, it's even unpronounceable. Why? Because he's looking at bigram distributions of letter.

Great, I thought, I'll do him one better. Random text using bigram distributions on words must surely be a LOT better than random text using bigram distributions on letters. At least the words come out readable, and they may even come out in a decent order.

So I sat down with his code, and hacked, tweaked, and monadized it to this

module Test where

import Probability
import Data.Char
import Control.Monad

filename="kjv.genesis"

bigram t = zip ws (tail ws) where
  ws = (words . map toLower . filter (\x -> isAlpha x || isSpace x)) t

distro = uniform . bigram

goal = readFile filename

goalD = fmap distro goal

one = do
  gD <- goalD
  (a,b) <- pick gD
  return [a,b]

many n = fmap unwords $ fmap concat $ sequence $ take n $ repeat one

So, I have some corpus -- I just pulled the King James Genesis to have some sort of body of text to work it, and saved it into the kjv.genesis file. Then, I can pop over to my beloved GHCi and execute

Prelude> :l Test
[1 of 4] Compiling ListUtils        ( ListUtils.hs, interpreted )
[2 of 4] Compiling Show             ( Show.hs, interpreted )
[3 of 4] Compiling Probability      ( Probability.hs, interpreted )
[4 of 4] Compiling Test             ( Test.hs, interpreted )
Ok, modules loaded: Show, Test, Probability, ListUtils.
*Test> many 30
Loading package haskell98 ... linking ... done.
"house and it to circumcised all cool of these things daughters from ye go dreamed for stead and in unto god of for wives done to i give god shall bowed himself and shem tower of small and said lord he was called his the days these are thither therefore cainan and and he rachel and hear my sons born"

The first execution will take a while, since it has to, y'know, digest the actual text, calculate distributions, and set everything up.

Subsequent executions also take quite some time, and I'm not at all certain why. An explanation would be nice, if someone has it.

And for some sample poetry, I give you

house and it to circumcised all cool of these things daughters from ye go dreamed for stead and in unto god of for wives done to i give god shall bowed himself and shem tower of small and said lord he was called his the days these are thither therefore cainan and and he rachel and hear my sons born

when isaac dry land unto him isaac and have i children struggled lord hath had eaten goods which morning was ye shall by the all the me this naphtali and years old was her to see of the his brother out of names after they feed thy mothers of ephron god said he put him into after his the tent

unto us i will to him i will the presence hands to his right name was possession of days journey down in that thou perizzites and and he they for god made that is of the twentys sake jacob so itself after thou and of anah years and in isaac pillar of and the builded a of canaan noah and

social