Thursday, September 16, 2010

Abstract Submitted for ISAS 2011

(as soon as I dig out from a new batch of wretchedly annoying and boring department chair stuff, I will do a post explaining more about lexomics, but here's a preview of what I hope to present at the International Society of Anglo-Saxonists conference at Madison this summer)

Untangling the Cynewulfian Corpus with Lexomic and Traditional Methods


At ISAS 2009 our multi-disciplinary team (English, Statistics, Computer Science) demonstrated software that researchers can use to generate statistical profiles of Anglo-Saxon texts.  In this paper we present some of the results arrived at by using these tools, showing how, when combined with traditional approaches, lexomic methods can shed light on some long-standing problems in Anglo-Saxon studies.  Specifically, we use hierarchical agglomerative clustering to examine relationships of the vocabulary of the signed poems of Cynewulf (Christ II, Juliana, Elene and The Fates of the Apostles) with other poems that have been over the years thought by some scholars to be by Cynewulf or in some way related to his work (Guthlac B, The Phoenix, Andreas, Christ I and Christ III).  

Our software tools, whose development was supported by the National Endowment for the Humanities, allow us to cut poems into sections which we can then compare to each other in terms of vocabulary distribution.  In early work on Genesis, Daniel, Azarias and Christ III, we determined that the dendrograms, or tree-diagrams created by hierarchical agglomerative cluster analysis could be used to identify sections of poems that are particularly similar to each other, such as lines 279-361 of Daniel with Azarias.  We also discovered that sections of poems cluster together by source: the chunks of Genesis B cluster together and separate from those with biblical sources; the section of Daniel influenced by liturgical texts and the portions of Christ III directly influenced by sermons of Caesarius of Arles were likewise separated. 

Recognizing the differences between sections of poems and the strong influence of sources enabled us to refine our techniques and avoid the pitfalls of previous attempts to identify digital “signatures” (effectively critiqued by Janet Bately).  Combining our more subtle lexomic methods with close reading and philological analysis allows us to determine that the signed Cynewulfian poems have significant similarities in vocabulary distribution (except in sections that are strongly influenced by direct paraphrase of a Latin source, such as much of Juliana).  This similarity does not extend to the unsigned poems, with the major exception of most of Guthlac B, which is indeed very similar in vocabulary distribution to the signed poems. The combined lexomic and traditional evidence supports the long-held suspicion that this poem is also by Cynewulf (the poem is acaudate, so the lack of a runic signature is not dispositive).  We also note that Christ I and Christ III exhibit different vocabulary distributions from each other and from Christ II, providing additional reasons to reject the one-poem hypothesis put forth so forcefully by Albert S. Cook but questioned by more recent scholars.  We then discuss the similarity and difference of sections of the other putatively Cynewulfian poems and the implications of these relationships for the poems’ relative chronology.  We conclude by noting that although computer-based and advanced statistical methods can never provide definitive arguments, they can usefully augment traditional analysis. 

Tools and software available at http://lexomics.wheatoncollege.edu .

2 comments:

STAG said...

hierarchical agglomerative cluster analysis

Now there's a mouthful.

Derek said...

Very cool! But, in thinking about the Christ cycle, doesn't Christ I already fall outside of a typical application of the tool due to its heavy use of liturgical materials? As in, we can't get a true positive, only a false negative due to the influnce of the Latin antiphons?