Tolkien Aloud
As the previous post suggests, we do a lot of reading aloud in our house even though both children like to read on the own. My daughter may be 11-going-on-17, but if you are wise, you won't mess with her bedtime reading, and my son likewise thinks it doesn't matter what else has gone on that night, where we've gone or how tired he is: if he doesn't get reading, something is wrong. I hope we can hold on to this family tradition as they continue to grow up.
A while back I published a piece in the journal Silver Leaves about reading The Lord of the Rings aloud to a four-year-old. That was my daughter, and we read the books again when she was six. Since then she's wanted to read other things (unlike me, who would force my father to start right in again on The Hobbit as soon as we got to the end of Return of the King), and we've done a lot of fantasy and science fiction. Over the past two years we've read Susan Cooper's The Dark is Rising; Ursula Le Guin's Earthsea books (just the original trilogy. I'm not reading Tehanu to my daughter, well, ever); the Anne McCaffrey Harper Hall dragonrider books; the Lloyd Alexander Chronicles of Prydain (she loved these more than anything else); some of the Heinlein "juveniles"; and the first two Hitchhiker's Guide books by Douglas Adams.
My son wasn't interested in The Lord of the Rings books when he was four, though he did like The Hobbit and we read it a couple of times, but last year, after he turned seven, we started on Fellowship and are now to the very end of The Two Towers.
So I've read Tolkien's work aloud two and a half times, plus probably four times for The Hobbit, and have a pretty good comparison group of other writers. Tolkien is by far, massively by far, the easiest of the major fantasy and SF authors to read aloud and the one whose work gains the most from oral presentation (Lloyd Alexander would probably be a fairly distant second).
Now that's the kind of evaluative statement that needs something to back it up. But one of the problems with trying to defend such an evaluation is that we have no agreed-upon metric, and so people end up quoting particular passages, pointing, and saying "See! See how great that is!" But often passages that are great out loud are also great when read silently, so the argument is hard to make in detail.
I think, though, that I've come across one minor technique in Tolkien that really makes a difference, and I think this aspect of his work arises from his having read so much of The Lord of the Rings to the Inklings: you never, when reading Tolkien, are in any doubt about who is talking in dialogue. There is always some kind of information, either in the set-up, the dialogue itself or the description, so that you never have the experience of reading a block of text and then realizing "Wait! That's Eomer talking, not Gandalf."
In contrast we might look at Frank Herbert's Dune. I just finished reading this out loud to my daughter, and almost every night there would be some large passage of dialogue that I'd start reading, thinking it was one character, and then, after it was finished, you'd get a bit of description or a "said X" that showed that it was an entirely different character speaking.
I'm particularly sensitive to this because I do "voices" for most of the characters in a text, and so when you start a passage thinking that it's in Gurney Hallek's accent, and it turns out to be Duke Leto or Stilgar, you have to go back and re-read the whole thing in the correct accent. Many of these passages in Dune are too long to scan to the end and find out who is speaking without losing focus on the part being read and drifting.
But this disorientation never happens in Tolkien, even in minor works like Farmer Giles. It's always easy to read his texts aloud, not only because you know who is talking, but because the writing--even the description of landscape--has a rhythm to it, and rise and fall that keeps you from having to stay at one pitch and speed all the time. There are rushing passages, but then also slower, more graceful ones.
Perhaps this orality (both in terms of oral roots and ease of oral presentation) is another aspect of Tolkien's work that makes it appealing to such a wide range of readers and draws people back to re-read the books over the courses of their lives.
Friday, January 20, 2012
Tuesday, January 17, 2012
Actual Dialogue at the Drout House This Evening
Dad: Time for bedtime reading.
Daughter: Now we can start Life, the Universe, and Everything. It's going to be awesome.
Dad: Yes, well, about that, sweetie: the print in Life, the Universe, and Everything is really small.
Daughter: So?
Dad: I can't find my reading glasses. We'll have to read the Iliad until they turn up.
Daughter: This is so not fair...
Dad: Sing in me, O Muse, the wrath of Achilles...
Dad: Time for bedtime reading.
Daughter: Now we can start Life, the Universe, and Everything. It's going to be awesome.
Dad: Yes, well, about that, sweetie: the print in Life, the Universe, and Everything is really small.
Daughter: So?
Dad: I can't find my reading glasses. We'll have to read the Iliad until they turn up.
Daughter: This is so not fair...
Dad: Sing in me, O Muse, the wrath of Achilles...
Sunday, January 08, 2012
Tolkien and the Nobel Prize
Many of my friends are talking about the revelation that C.S. Lewis nominated Tolkien for the Nobel Prize for Literature and that JRRT was rejected in part because a jury member argued that The Lord of the Rings "has not in any way measured up to storytelling of the highest quality."
Many people are getting a good laugh at the "storytelling" critique, and deservedly so. Tom Shippey has long documented the ability of Tolkien's work to cause supposedly intelligent critics to make fools of themselves. There's a long list of names and examples in Author of the Century, and you can tell that Tom enjoys finding obvious contradictions between what the critics say is good writing in other contexts and how they judge Tolkien.
But although we can just laugh at close-minded, self-contradictory critics, it's also useful to try figure out where people are coming from. Although I have never agreed with the idea that Tolkien's prose is bad, I think it is important not to deny that it is different from what mainstream literary taste and scholarship thought was good in the 1950s. Here my critical approach goes in a different direction than Tom's. He is saying "here's what you said is good literature according to your theory, and Tolkien fulfills every one of those qualities on the checklist, so you should admit that it's good literature." This is rhetorically very effective, and I'm grateful that Tom has done it, but I have less faith in mainstream theories of literary greatness. In the end, I think the contradictions do not show so much that Tolkien is great literature, but that the abstract theories of "what makes good literature" are pretty much useless.
The style of Tolkien's prose isn't bad. It's merely discontinuous (mostly) with the stylistic conventions that were in place at the middle of the twentieth century. Modernist were trying to make their prose seem new and different--the meta-instruction for all Modernist prose could be conceived of as: "never write a sentence that has previously existed. Try not even to use pre-existing phrases. If you must use a pre-existing collocation, only do so ironically." Tolkien was attempting to make his prose connected to long traditions in English. High-culture Modernists just don't understand Tolkien because he violates that fundamental convention. The irony is that Tolkien is discontinuous from Modernism in the same way that Modernism was attempting to become discontinuous from the pre-existing tradition.
Modernism wants a reader to feel that there is no tradition, no pre-existing set of conventions and cliches (though there is, as you can easily see by reading a bunch of second-tier mid-century Modernist works). Tolkien was quite deliberately linking with the traditions of English literature (particularly medieval literature), resurrecting popular poetic forms (i.e., no blank verse, almost no pentameter in his poems), and making his text appear as if it is part of a long-standing tradition. The aesthetics are completely different, and it's hard to see the Nobel committee being willing--or able--to get beyond their comfort zone in the Modernist style.
The great contribution of the "Theory Wars" was to cast doubt on the pronouncements of the literary establishment and even on the wisdom of taking that establishment very seriously. The drawback is that political interpretations end up colonizing all analysis of texts because politics is a lowest common denominator for criticism: you don't have to analyze in much detail if all you are talking about is politics and ideology. Political analysis is easy compared to aesthetic analysis when aesthetics are divorced from "this is what my friends and I like," an ideology that is, unfortunately, quite well enough established in literary studies to maintain its hegemony over the ever-shrinking field.
Many of my friends are talking about the revelation that C.S. Lewis nominated Tolkien for the Nobel Prize for Literature and that JRRT was rejected in part because a jury member argued that The Lord of the Rings "has not in any way measured up to storytelling of the highest quality."
Many people are getting a good laugh at the "storytelling" critique, and deservedly so. Tom Shippey has long documented the ability of Tolkien's work to cause supposedly intelligent critics to make fools of themselves. There's a long list of names and examples in Author of the Century, and you can tell that Tom enjoys finding obvious contradictions between what the critics say is good writing in other contexts and how they judge Tolkien.
But although we can just laugh at close-minded, self-contradictory critics, it's also useful to try figure out where people are coming from. Although I have never agreed with the idea that Tolkien's prose is bad, I think it is important not to deny that it is different from what mainstream literary taste and scholarship thought was good in the 1950s. Here my critical approach goes in a different direction than Tom's. He is saying "here's what you said is good literature according to your theory, and Tolkien fulfills every one of those qualities on the checklist, so you should admit that it's good literature." This is rhetorically very effective, and I'm grateful that Tom has done it, but I have less faith in mainstream theories of literary greatness. In the end, I think the contradictions do not show so much that Tolkien is great literature, but that the abstract theories of "what makes good literature" are pretty much useless.
The style of Tolkien's prose isn't bad. It's merely discontinuous (mostly) with the stylistic conventions that were in place at the middle of the twentieth century. Modernist were trying to make their prose seem new and different--the meta-instruction for all Modernist prose could be conceived of as: "never write a sentence that has previously existed. Try not even to use pre-existing phrases. If you must use a pre-existing collocation, only do so ironically." Tolkien was attempting to make his prose connected to long traditions in English. High-culture Modernists just don't understand Tolkien because he violates that fundamental convention. The irony is that Tolkien is discontinuous from Modernism in the same way that Modernism was attempting to become discontinuous from the pre-existing tradition.
Modernism wants a reader to feel that there is no tradition, no pre-existing set of conventions and cliches (though there is, as you can easily see by reading a bunch of second-tier mid-century Modernist works). Tolkien was quite deliberately linking with the traditions of English literature (particularly medieval literature), resurrecting popular poetic forms (i.e., no blank verse, almost no pentameter in his poems), and making his text appear as if it is part of a long-standing tradition. The aesthetics are completely different, and it's hard to see the Nobel committee being willing--or able--to get beyond their comfort zone in the Modernist style.
The great contribution of the "Theory Wars" was to cast doubt on the pronouncements of the literary establishment and even on the wisdom of taking that establishment very seriously. The drawback is that political interpretations end up colonizing all analysis of texts because politics is a lowest common denominator for criticism: you don't have to analyze in much detail if all you are talking about is politics and ideology. Political analysis is easy compared to aesthetic analysis when aesthetics are divorced from "this is what my friends and I like," an ideology that is, unfortunately, quite well enough established in literary studies to maintain its hegemony over the ever-shrinking field.
Friday, January 06, 2012
A Return to Blogging (?)
I started this blog in June of 2002, nine and half years ago, and for the first few years it was great fun. The "blogsphere" itself was a lot of fun then, much like USENET back in the late 1980s: although a lot of people were participating, there was an intimacy to discussions and the trolls, spammers and hustlers hadn't yet taken over. Surprising people would reply to posts and the debates we got into could be quite interesting. I derived a lot of intellectual energy from Wormtalk.
But as time went on blogging became less fun. The advent of group blogs and monetized blogs and then group monetized blogs made individual blogs less personal. The movement from blogging to FB and Twitter reduced the number of interactions that happened on the blog itself, and the further development of coterie blogs with their hierarchies and cross-linked promotions further reduced the element of spontaneity that had been so much fun. Sometime in 2008 or 2009 I found myself dreading posting to Wormtalk and that, coupled with the effort that Anglo-Saxon Aloud required, saw me significantly reduce my posting of new material. Then came the economic crisis, when pretty much all my energies department chair energy was spent trying to shuffle around resources so that no one's jobs (we were successful, but at great cost of time and effort). I thought I would get back to blogging when I was on sabbatical, but during that time I ended up teaching anyway (we gave up the funding for a replacement for me in order to keep a colleague in a different time period) and sabbatical turned out to be more work than regular teaching. Simultaneously Scott Nokes stopped regularly updating his Unlocked-Wordhoard, which had become a significant source on inspiration, and I got some push-back from people about blogging while a department chair. Then in August 2010 the "Terrible Events" happened to members of my extended family (P.S.: At that time I de-friended pretty much every academic I know on Facebook. It was nothing personal. I just didn't want family things spreading all over the place, and didn't trust myself not to slip in managing different FB privacy levels for different groups of people). So in 2011 I posted a total of 3 times. Wormtalk was effectively defunct.
But although I find that I dislike, a lot, many things about the internet circa 2012, I miss sharing ideas in the more immediate form of blogging (as opposed to journal articles that take two years to appear and four to see a response). So I am going to give Wormtalk another whirl and see if I can get back some of the immediacy, energy and pleasure that was so apparent in the "Golden Age of Blogging" from 2003-2007. I plan to talk about Anglo-Saxon, Tolkien, our Lexomics research, the job market, graduate school, teaching and learning. We'll see what happens.
I started this blog in June of 2002, nine and half years ago, and for the first few years it was great fun. The "blogsphere" itself was a lot of fun then, much like USENET back in the late 1980s: although a lot of people were participating, there was an intimacy to discussions and the trolls, spammers and hustlers hadn't yet taken over. Surprising people would reply to posts and the debates we got into could be quite interesting. I derived a lot of intellectual energy from Wormtalk.
But as time went on blogging became less fun. The advent of group blogs and monetized blogs and then group monetized blogs made individual blogs less personal. The movement from blogging to FB and Twitter reduced the number of interactions that happened on the blog itself, and the further development of coterie blogs with their hierarchies and cross-linked promotions further reduced the element of spontaneity that had been so much fun. Sometime in 2008 or 2009 I found myself dreading posting to Wormtalk and that, coupled with the effort that Anglo-Saxon Aloud required, saw me significantly reduce my posting of new material. Then came the economic crisis, when pretty much all my energies department chair energy was spent trying to shuffle around resources so that no one's jobs (we were successful, but at great cost of time and effort). I thought I would get back to blogging when I was on sabbatical, but during that time I ended up teaching anyway (we gave up the funding for a replacement for me in order to keep a colleague in a different time period) and sabbatical turned out to be more work than regular teaching. Simultaneously Scott Nokes stopped regularly updating his Unlocked-Wordhoard, which had become a significant source on inspiration, and I got some push-back from people about blogging while a department chair. Then in August 2010 the "Terrible Events" happened to members of my extended family (P.S.: At that time I de-friended pretty much every academic I know on Facebook. It was nothing personal. I just didn't want family things spreading all over the place, and didn't trust myself not to slip in managing different FB privacy levels for different groups of people). So in 2011 I posted a total of 3 times. Wormtalk was effectively defunct.
But although I find that I dislike, a lot, many things about the internet circa 2012, I miss sharing ideas in the more immediate form of blogging (as opposed to journal articles that take two years to appear and four to see a response). So I am going to give Wormtalk another whirl and see if I can get back some of the immediacy, energy and pleasure that was so apparent in the "Golden Age of Blogging" from 2003-2007. I plan to talk about Anglo-Saxon, Tolkien, our Lexomics research, the job market, graduate school, teaching and learning. We'll see what happens.
Wednesday, April 27, 2011
NEH Supports Lexomics
We got the grant!
I guess third time was the charm. The National Endowment for the Humanities has fully funded our Lexomics project for the next two years (project total $178,000). We will be expanding lexomic analysis from just Old English (though we will be continuing this research) to medieval Latin, Middle English, and texts from the Harlem Renaissance, and we will be collaborating with Shawn Christian (Harlem Renaissance), Sarah Downey (Latin), Yvette Kisor (Old English, Beowulf) and Scott Kleinman (Old and Middle English, approaches to computational lemmatization). It's going to be an exciting two years.
Very soon we will have available on the lexomics website (lexomics.wheatoncollege.edu) a tool called "Divi-text," which allows people to upload any electronic text and cut it into chunks (in preparation for lexomic analysis). In the next year or so we will also complete the "dendro-grammer," which will enable researchers to produce their own dendrograms without having to learn how to use the statistical analysis software, R.
In July our team's first article in a major journal will appear:
Michael D.C. Drout, Michael J. Kahn, Mark D. LeBlanc and Christina Nelson, "Of Dendrogrammatology: Lexomic Methods for Analyzing the Relationships Among Old English Poems," JEGP 110 (2011): 301-36.
Some time after that another article from the research group will appear in Modern Philology:
Sarah Downey, Michael D.C. Drout, Michael J. Kahn and Mark D. LeBlanc, "'Books Tell Us': Lexomic and Traditional Evidence for the Sources of Guthlac A."
Currently work is ongoing on the Cynewulfian corpus (though some of that is in the JEGP article), Beowulf, Bede's Ecclesiastical History, the Old English translation of Orosius, King Horn, and Mule Bone by Zora Neale Hurston and Langston Hughes.
I will post links to software and papers.
We got the grant!
I guess third time was the charm. The National Endowment for the Humanities has fully funded our Lexomics project for the next two years (project total $178,000). We will be expanding lexomic analysis from just Old English (though we will be continuing this research) to medieval Latin, Middle English, and texts from the Harlem Renaissance, and we will be collaborating with Shawn Christian (Harlem Renaissance), Sarah Downey (Latin), Yvette Kisor (Old English, Beowulf) and Scott Kleinman (Old and Middle English, approaches to computational lemmatization). It's going to be an exciting two years.
Very soon we will have available on the lexomics website (lexomics.wheatoncollege.edu) a tool called "Divi-text," which allows people to upload any electronic text and cut it into chunks (in preparation for lexomic analysis). In the next year or so we will also complete the "dendro-grammer," which will enable researchers to produce their own dendrograms without having to learn how to use the statistical analysis software, R.
In July our team's first article in a major journal will appear:
Michael D.C. Drout, Michael J. Kahn, Mark D. LeBlanc and Christina Nelson, "Of Dendrogrammatology: Lexomic Methods for Analyzing the Relationships Among Old English Poems," JEGP 110 (2011): 301-36.
Some time after that another article from the research group will appear in Modern Philology:
Sarah Downey, Michael D.C. Drout, Michael J. Kahn and Mark D. LeBlanc, "'Books Tell Us': Lexomic and Traditional Evidence for the Sources of Guthlac A."
Currently work is ongoing on the Cynewulfian corpus (though some of that is in the JEGP article), Beowulf, Bede's Ecclesiastical History, the Old English translation of Orosius, King Horn, and Mule Bone by Zora Neale Hurston and Langston Hughes.
I will post links to software and papers.
Monday, February 21, 2011
Lexomics: An Explanation
I've been meaning to write this post for, I don't know, months, but life and being department Chair (those are two distinct states, like living and being a zombie) has gotten in the way. But now I've just learned that I'm going to be part of a working group at the Santa Fe Institute in March, so I need to get my ideas in order, and this blog is as good a place as any to do that.
The word "lexomics" describes an evolving set of methods for finding patterns in textual corpora. The term is taken from Bioinformatics, where it is used to describe the search for "words" and patterns in genomes (my colleague Betsey Dyer is apparently the first person to coin the word, an obvious adaptation of "genomics").
Our lexomic methods (which are described in much more detail in forthcoming papers in JEGP and Modern Philology, as well as in my new book, Tradition and Influence) use computer-assisted statistical methods to troll through the Dictionary of Old English corpus. Essentially, we cut texts into segments, count the words in each segment and compute their relative frequencies, and then compare these relative frequencies from segment to segment using a statistical method called hierarchical, agglomerative clustering. This method produces a branching diagram, called a dendrogram, that shows how similar (and thus how different) each segment is from each other.
Other scholars, such as John Burrows, have used similar methods to examine texts and even to attempt to determine the authorship of disputed or anonymous texts -- with varying degree of success. These approaches are usually much more complex and sophisticated than ours: scholars sometimes remove the 50 or 100 most frequently used words, or they force all words to standard spellings and morphologies, or they lemmatize their texts. We simply dumped all the words into the hopper and started counting them (this was just a preliminary experiment, after all).
But to our surprise, our methods seemed to "work" very effectively without any sophistication. For example, we were able to match one poem with the correct section of another poem to which it was related, and we could separate out the two well-known sections of a third poem with absolute accuracy (I'm only being oblique here because I think it would be impolite to scoop the massive JEGP article which will be out soon). It may be that by lumping together orthographic, morphological and other variants, we were able to detect patterns that were relatively subtle.
We also seem to be able to detect when a segment of a text has a different source than the main body of the text, which is particulary exciting for Anglo-Saxon studies because we have so many texts that are composites (so we have good controls) and others whose composite nature is controversial.
Thus far we get "good" results--in the sense that they are consistent with what we know from traditional methods--for Old English poetry and prose, Latin prose, Middle English poetry, and, intriguingly, some Modern English prose. We are hoping to refine the techniques by testing fully lemmatized texts (this is difficult, because lemmatizing is incredibly time consuming) and trying other, more sophisticated statistical methods. As you'll see in the two articles (and the book), we've been able to shed some light on the Cynewulfian corpus and on the structure of Guthlac A and its relationship to other texts. Soon we hope to be able to tell you some interesting things about Alfred's Orosius, King Horn, Bede's History, and the play Mule Bone.
The connection to my book is this: lexomics methods can detect and to some degree measure influence. In my new book, I argue that tradition is a special case of influence, and so detecting influence is in a way detecting certain kinds of traditions. This gives us an empirical way of looking at a topic that has tended to be approached in a very fuzzy way.
But--and this is perhaps the most important point in this capsule summary--lexomics does not work at all if you don't have a deep familiarity with the texts ("wearing the English Professor hat" I call it) and the critical problems associated with them. A dendrogram itself can tell you very little, but a dendrogram coupled to an understanding of the sources and structure of a poem has the ability to shed light upon--and even re-date--a complex text.
I'm hoping in my visit to the Santa Fe Institute to learn how others are approaching culture as a complex evolutionary system, and perhaps improve lexomics (and certainly offer it to them) as a tool for trying to untangle and trace a few strands of the massive cultural tapestry.
I've been meaning to write this post for, I don't know, months, but life and being department Chair (those are two distinct states, like living and being a zombie) has gotten in the way. But now I've just learned that I'm going to be part of a working group at the Santa Fe Institute in March, so I need to get my ideas in order, and this blog is as good a place as any to do that.
The word "lexomics" describes an evolving set of methods for finding patterns in textual corpora. The term is taken from Bioinformatics, where it is used to describe the search for "words" and patterns in genomes (my colleague Betsey Dyer is apparently the first person to coin the word, an obvious adaptation of "genomics").
Our lexomic methods (which are described in much more detail in forthcoming papers in JEGP and Modern Philology, as well as in my new book, Tradition and Influence) use computer-assisted statistical methods to troll through the Dictionary of Old English corpus. Essentially, we cut texts into segments, count the words in each segment and compute their relative frequencies, and then compare these relative frequencies from segment to segment using a statistical method called hierarchical, agglomerative clustering. This method produces a branching diagram, called a dendrogram, that shows how similar (and thus how different) each segment is from each other.
Other scholars, such as John Burrows, have used similar methods to examine texts and even to attempt to determine the authorship of disputed or anonymous texts -- with varying degree of success. These approaches are usually much more complex and sophisticated than ours: scholars sometimes remove the 50 or 100 most frequently used words, or they force all words to standard spellings and morphologies, or they lemmatize their texts. We simply dumped all the words into the hopper and started counting them (this was just a preliminary experiment, after all).
But to our surprise, our methods seemed to "work" very effectively without any sophistication. For example, we were able to match one poem with the correct section of another poem to which it was related, and we could separate out the two well-known sections of a third poem with absolute accuracy (I'm only being oblique here because I think it would be impolite to scoop the massive JEGP article which will be out soon). It may be that by lumping together orthographic, morphological and other variants, we were able to detect patterns that were relatively subtle.
We also seem to be able to detect when a segment of a text has a different source than the main body of the text, which is particulary exciting for Anglo-Saxon studies because we have so many texts that are composites (so we have good controls) and others whose composite nature is controversial.
Thus far we get "good" results--in the sense that they are consistent with what we know from traditional methods--for Old English poetry and prose, Latin prose, Middle English poetry, and, intriguingly, some Modern English prose. We are hoping to refine the techniques by testing fully lemmatized texts (this is difficult, because lemmatizing is incredibly time consuming) and trying other, more sophisticated statistical methods. As you'll see in the two articles (and the book), we've been able to shed some light on the Cynewulfian corpus and on the structure of Guthlac A and its relationship to other texts. Soon we hope to be able to tell you some interesting things about Alfred's Orosius, King Horn, Bede's History, and the play Mule Bone.
The connection to my book is this: lexomics methods can detect and to some degree measure influence. In my new book, I argue that tradition is a special case of influence, and so detecting influence is in a way detecting certain kinds of traditions. This gives us an empirical way of looking at a topic that has tended to be approached in a very fuzzy way.
But--and this is perhaps the most important point in this capsule summary--lexomics does not work at all if you don't have a deep familiarity with the texts ("wearing the English Professor hat" I call it) and the critical problems associated with them. A dendrogram itself can tell you very little, but a dendrogram coupled to an understanding of the sources and structure of a poem has the ability to shed light upon--and even re-date--a complex text.
I'm hoping in my visit to the Santa Fe Institute to learn how others are approaching culture as a complex evolutionary system, and perhaps improve lexomics (and certainly offer it to them) as a tool for trying to untangle and trace a few strands of the massive cultural tapestry.
Monday, January 31, 2011
A Nice Little Trick Enabled by Lexomics (and Excel)
It's nice when people overhear conversations and then help you.
I was at the climbing gym (Rock Spot Climbing in Boston -- the best climbing gym anywhere) and, in between bouldering runs, was talking with my wife about how my research was coming. Somehow we got to talking about whether Excel could speed up some of my searching. A guy at the gym overheard and said he had been the Excel guru for a Psych research project and offered to help. What follows comes from that brief collaboration. By combining material on the Lexomics website with Excel, you can do some interesting searching in uncommon words in the corpus of Anglo-Saxon.
Let's say you are researching a particular Old English poem, say, Juliana. You want to look at the more uncommon words in this poem and see if they are shared with the rest of Cynewulf's poetry or with other texts in Anglo-Saxon.
Go to the Lexomics website, choose "tools," and the "word frequencies." Click "entire corpus" and then "get stats." Click on the HERE to download this as an Excel file. You now have a file with a list of every word in the Anglo-Saxon corpus ranked in order of frequency.
Copy the column of words and the column of word frequencies and paste them into a new spreadsheet as column A and column B.
Now go back to the lexomics website, go to "tools," "word frequencies," and choose the poem of interest. "Get stats" for that poem and download them by clicking on HERE. You now have an Excel file with a list of every word in the poem ranked in order. Copy the column with the words and paste it into column C in your spreadsheet.
Now you are ready to find those words that appear in your poem and only a few times in the rest of the corpus.
Go to cell D1 and enter the following formula:
=SUM( if ( $A$1:$A$x = C1, if ($B$1:$B$x < n, 1))) ; where x = the total number of words in column A and n = the low frequency threshold (i.e., you want all words that appear fewer than 5 times)
*important* do not just press ENTER. Instead, press CTR-SHIFT-ENTER.
Then copy the formula into the entire D column by clicking the box in the lower right corner and dragging down to the last word in D.
It will take a few moments for processing.
When processing is complete, you will have a 0 in every cell in D in which the word does not fulfill the criteria (appearing in your poem and between 5 and 2 times in the corpus), and a 1 when the word does fulfill the criteria.
You can search for these 1's manually or use "Conditional Formatting" to bold or color the rows with a 1 in column D.
Now you can search these words in the Dictionary of Old English concordance and see where else they appear. Look for patterns. Enjoy.
It's nice when people overhear conversations and then help you.
I was at the climbing gym (Rock Spot Climbing in Boston -- the best climbing gym anywhere) and, in between bouldering runs, was talking with my wife about how my research was coming. Somehow we got to talking about whether Excel could speed up some of my searching. A guy at the gym overheard and said he had been the Excel guru for a Psych research project and offered to help. What follows comes from that brief collaboration. By combining material on the Lexomics website with Excel, you can do some interesting searching in uncommon words in the corpus of Anglo-Saxon.
Let's say you are researching a particular Old English poem, say, Juliana. You want to look at the more uncommon words in this poem and see if they are shared with the rest of Cynewulf's poetry or with other texts in Anglo-Saxon.
Go to the Lexomics website, choose "tools," and the "word frequencies." Click "entire corpus" and then "get stats." Click on the HERE to download this as an Excel file. You now have a file with a list of every word in the Anglo-Saxon corpus ranked in order of frequency.
Copy the column of words and the column of word frequencies and paste them into a new spreadsheet as column A and column B.
Now go back to the lexomics website, go to "tools," "word frequencies," and choose the poem of interest. "Get stats" for that poem and download them by clicking on HERE. You now have an Excel file with a list of every word in the poem ranked in order. Copy the column with the words and paste it into column C in your spreadsheet.
Now you are ready to find those words that appear in your poem and only a few times in the rest of the corpus.
Go to cell D1 and enter the following formula:
=SUM( if ( $A$1:$A$x = C1, if ($B$1:$B$x < n, 1))) ; where x = the total number of words in column A and n = the low frequency threshold (i.e., you want all words that appear fewer than 5 times)
*important* do not just press ENTER. Instead, press CTR-SHIFT-ENTER.
Then copy the formula into the entire D column by clicking the box in the lower right corner and dragging down to the last word in D.
It will take a few moments for processing.
When processing is complete, you will have a 0 in every cell in D in which the word does not fulfill the criteria (appearing in your poem and between 5 and 2 times in the corpus), and a 1 when the word does fulfill the criteria.
You can search for these 1's manually or use "Conditional Formatting" to bold or color the rows with a 1 in column D.
Now you can search these words in the Dictionary of Old English concordance and see where else they appear. Look for patterns. Enjoy.
Subscribe to:
Posts (Atom)
