Syntaxfree writes over at his blog about a silly little toy he wrote,
using the PFP library,
to generate random
text.
Now, his text is unreadable. I mean, it's even unpronounceable. Why?
Because he's looking at bigram distributions of letter.
Great, I thought, I'll do him one better. Random text using bigram
distributions on words must surely be a LOT better than random text
using bigram distributions on letters. At least the words come out
readable, and they may even come out in a decent order.
So I sat down with his code, and hacked, tweaked, and monadized it to
this
module Test where
import Probability
import Data.Char
import Control.Monad
filename="kjv.genesis"
bigram t = zip ws (tail ws) where
ws = (words . map toLower . filter (\x -> isAlpha x || isSpace x)) t
distro = uniform . bigram
goal = readFile filename
goalD = fmap distro goal
one = do
gD <- goalD
(a,b) <- pick gD
return [a,b]
many n = fmap unwords $ fmap concat $ sequence $ take n $ repeat one
So, I have some corpus -- I just pulled the King James Genesis to have
some sort of body of text to work it, and saved it into the
kjv.genesis file. Then, I can pop over to my beloved GHCi and execute
Prelude> :l Test
[1 of 4] Compiling ListUtils ( ListUtils.hs, interpreted )
[2 of 4] Compiling Show ( Show.hs, interpreted )
[3 of 4] Compiling Probability ( Probability.hs, interpreted )
[4 of 4] Compiling Test ( Test.hs, interpreted )
Ok, modules loaded: Show, Test, Probability, ListUtils.
*Test> many 30
Loading package haskell98 ... linking ... done.
"house and it to circumcised all cool of these things daughters from ye go dreamed for stead and in unto god of for wives done to i give god shall bowed himself and shem tower of small and said lord he was called his the days these are thither therefore cainan and and he rachel and hear my sons born"
The first execution will take a while, since it has to, y'know, digest
the actual text, calculate distributions, and set everything up.
Subsequent executions also take quite some time, and I'm not at all
certain why. An explanation would be nice, if someone has it.
And for some sample poetry, I give you
house and it to circumcised all cool of these things daughters from
ye go dreamed for stead and in unto god of for wives done to i give
god shall bowed himself and shem tower of small and said lord he was
called his the days these are thither therefore cainan and and he
rachel and hear my sons born
when isaac dry land unto him isaac and have i children struggled
lord hath had eaten goods which morning was ye shall by the all the
me this naphtali and years old was her to see of the his brother out
of names after they feed thy mothers of ephron god said he put him
into after his the tent
unto us i will to him i will the presence hands to his right name
was possession of days journey down in that thou perizzites and and
he they for god made that is of the twentys sake jacob so itself
after thou and of anah years and in isaac pillar of and the builded
a of canaan noah and