Monday, November 18, 2013

A Dada Analysis of Moby Dick

In my statistics class we're learning about the importance of true randomization in data analysis. While this is not a scientific analysis, I really liked the idea of true, computer generated randomization in my analysis. I started out a little too ambitious -


Step 1, pick 5 random chapters:

19 99 29 50 15

Step 2, get word count for chapter and then randomly select about a paragraph's worth of words from that chapter. I decided 100 words is a good paragraph. Does not include chapter title.

Chapter 19 - 1200 words:

399 430 578 709 1045
837 407 170 23 315
364 828 359 1190 403
135 550 136 806 838
682 445 384 455 1098
197 988 736 923 131
357 331 638 1007 824
781 452 858 929 366
485 1054 1174 191 450
767 221 1026 1146 276
209 208 1066 644 605
444 815 640 428 391
544 526 360 544 720
934 796 406 744 817
141 699 305 1187 1080
1032 884 1039 980 1046
353 691 104 140 969
1023 544 639 582 323
676 588 675 364 477
528 495 346 726 649

Chapter 99 - 2407 words:

22 1011 2305 267 813
1765 469 1843 2313 1731
1967 862 2200 1164 1943
2131 1312 937 983 1351
384 912 964 1831 1165
1328 955 306 492 2116
2038 2212 81 50 1400
1252 2274 2055 764 27
1950 189 679 34 917
1353 1488 138 1262 2082
844 984 2102 615 2014
2059 1551 304 404 2237
217 1498 679 1926 934
1001 449 2353 463 1871
1192 773 618 557 1387
933 1407 2249 1318 415
2314 1570 181 1757 14
38 236 196 1423 958
63 1845 124 1276 1234
2282 1938 1198 1483 1027

Realization: There's no way of finding the 2305th word in the chapter that is not time consuming, but I did see that my word count takes stats for "keyword density" (words that occur most frequently). New idea.

Step 1: I will select more chapters and take the 10 most dense keywords. Reselect random chapters but this time take 10. I select among numbers 1-137 to account for etymology and the epilogue, which I have assigned as 136 and 137 respectively. That will leave us with 100 words (10 from 10 chapters). Again, excluding chapter numbers and titles from selections.

42 4 37 72 3     77 12 6 15 117

Chapter 42:

white 46 (19%)
whiteness 26 (11%)
yet 14 (6%)
nor 11 (5%)
why 11 (5%)
things 10 (4%)
upon 10 (4%)
her 10 (4%)
man 9 (4%)
same 9 (4%)

Chapter 4:

me 19 (14%)
queequeg 13 (10%)
up 12 (9%)
arm 9 (7%)
bed 9 (7%)
over 8 (6%)
very 7 (5%)
could 6 (4%)
room 6 (4%)
lay 6 (4%)

Chapter 37:

ye 14 (20%)
me 12 (17%)
swerve 5 (7%)
iron 4 (6%)
will 3 (4%)
come 3 (4%)
ahab 2 (3%)
er 2 (3%)
ever 2 (3%)
like 2 (3%)

Chapter 72:

queequeg 16 (10%)
ginger 15 (10%)
whale 13 (8%)
poor 9 (6%)
now 8 (5%)
upon 7 (5%)
monkey 7 (5%)
rope 7 (5%)
hands 6 (4%)
harpooneer 6 (4%)

Chapter 3:

me 46 (8%)
harpooneer 35 (6%)
bed 31 (6%)
landlord 28 (5%)
up 24 (4%)
now 24 (4%)
room 20 (4%)
thought 18 (3%)
head 18 (3%)
said 18 (3%)

Chapter77:

whale 11 (15%)
tun 7 (10%)
sperm 6 (8%)
heidelburgh 5 (7%)
case 4 (5%)
head 4 (5%)
forming 4 (5%)
know 3 (4%)
upper 3 (4%)
end 3 (4%)

Chapter 12:

queequeg 10 (12%)
father 5 (6%)
ship 5 (6%)
me 5 (6%)
among 4 (5%)
last 4 (5%)
might 4 (5%)
christians 4 (5%)
old 4 (5%)
now 4 (5%)

Chapter 6:

new 11 (13%)
bedford 9 (11%)
will 8 (10%)
town 6 (7%)
upon 5 (6%)
green 5 (6%)
see 5 (6%)
like 5 (6%)
streets 4 (5%)
country 4 (5%)

Chapter 15:

hussey 12 (9%)
us 10 (8%)
clam 10 (8%)
queequeg 9 (7%)
mrs 9 (7%)
chowder 8 (6%)
cod 7 (5%)
little 6 (5%)
supper 6 (5%)
pots 6 (5%)

Chapter 117:

her 12 (13%)
ship 9 (9%)
come 7 (7%)
down 6 (6%)
two 6 (6%)
filled 6 (6%)
ahab 5 (5%)
bachelor 5 (5%)
men 4 (4%)
whale 4 (4%)

Step 4: I want make this a little more fun. Make every 5th word bigger.

Step 5: Take the words I've found and arrange them arbitrarily.

I printed out my words and placed them over top of a seagull picture I pulled from the top of my stack of magazine stuff I have in a drawer. All I had for adhesive was glitter glue. At first I was tempted to remove certain words like hussey and sperm. I'm used to censoring myself in a classroom environment, but doing so is not necessary and would take away from the work. I also thought of limiting myself to a small selection of the words, but I instead made it a goal to use every word. This is the result.





To me, this is a lot more meaningful than something I might have made with a word cloud, as cool as those are. What do we get from this?

When I read it, I feel like I am having soup on a rainy day.

No comments:

Post a Comment