Now I think I have a new one. And it's mathematical.

This summer our research group was working on the problem of thorn / eth distribution . We were having trouble visualizing the data. I don't know why, but suddenly into my mind popped the notion of a rolling average, something I think I'd learned way back in high school and which had shown up when I was being creative with budgets to avoid laying people off during the financial crisis: it turns out that the amount of money you are allowed to draw from an endowment's revenue stream is based on a rolling average of the returns in several previous quarters. This saved me during the crash, as we had a little more money right at the beginning—since the the previous quarters were propping up the average—so we could at least give visiting and part-time faculty a year or two to try to find something else instead of just dropping them into a terrible economy (my sole accomplishment as department chair was that I didn't lay anyone off or fail to renew a contract).

So I started calculating the rolling ratio of θ (total number of þ divided by total number of þ plus total number of ð) through a text: choose a "window" of words or letters, add up all the thorns and eths in that window, calculate θ, and then move the window one unit to the right and re-calculate. The plots of the rolling ratios turn out to be very interesting. I'm just finishing up a paper now on what they might tell us about a work's textual history.

But I have been worried--following a chance remark by Janet Bately at ISAS Dublin--that all we were detecting with θ was the frequency of first-, second- or third-person plural present tense or plural imperative verbs. These forms end with an interdental, and there certainly seems to be a correlation between terminal interdentals and scribal use of ð (most famously by the B-scribe of

*Beowulf*, but elsewhere as well). I wanted to know if θ was just a complicated proxy measurement for portions of the poem in the plural present tense or the imperative.

So we developed another measure, τ, which is the ratio of terminal interdentals (þ and ð) to the total number of interdentals in a passage. We calculated τ as a rolling ratio as well, and then compared the plots of τ and θ.

Sometimes these plots appear to be negatively correlated with each other: when τ goes down, θ increases, but other times, not so much. And just looking at the graphs wasn't entirely satisfactory. So I calculated Pearson correlation coefficients between τ and θ. It turns out that these are pretty ambiguous when applied to whole texts, generally being on the order of .3 (1.0 would be perfect correlation and 0 would be no correlation at all). That wasn't entirely helpful: with an r of .3, tense and number

*could*be contributing to θ, but other things (textual history) could be as well.

Then last night I was staring in frustration at the τ and θ graphs for the Old English

*Genesis*, and it hit me: there was a visible correlation between τ and θ in

*Genesis B*, but not in

*Genesis A*. I quickly calculated the Pearson correlation coefficient for each poem and indeed,

*Genesis B*is highly correlated, with an r of .69, while

*Genesis A*is only weakly correlated.

And here's where both the "good trick" and my question of legitimacy comes in. I realized that I could do the rolling window trick with the correlation coefficient. Calculate τ and θ, then choose a window length and calculate the correlation coefficient for that window. Then shift to the right and recalculate r. Plot the whole thing.

Except that it was hard to read the plot, since you ended up with both positive and negative correlations (negative correlation just means that when one variable goes up, the other goes down. It's just as much a correlation as a positive one). So I had idea of taking the absolute value of r and plotting that. When you do so, you get very interesting results.

*Genesis B*, for example, jumps right out of the

*Genesis*plot. So too does the canticle-sourced material in

*Daniel*and the section of

*Christ III*that's based on the sermon of Caesarius of Arles.

My tentative conclusion: because not all scribes consistently followed the "terminal interdental to be represented by ð" rule, the correlation between τ and θ is actually useful data. Instead of simply invalidating θ, the correlation--and its absence--tells you something about the copying history of the text. My hunch is that it's the later scribes who produce segments with closely correlated τ and θ, so when we don't see the correlation, we can hypothesize that we're looking at a text that was written and copied earlier and so in which the inertia of the earlier forms is influencing that final copy.

But my worry is that a rolling Pearson's correlation coefficient is somehow statistically or mathematically illegitimate. You've got two rolling ratios (τ and θ), each of which over-samples many of the same data points (because the same point is going to influence multiple windows) and then you're doing the same kind of rolling comparison with over-sampling with the relatively complex Pearson formula. I'm worried that my lack of mathematical and statistical sophistication has led me to miss something that should cancel out something else. Unfortunately, it is finals week, so I can't meet with my friend and co-author the statistician for a while at least, so I just have to live with being both excited at a potential discovery and worried that at any moment the intellectual floor is going to collapse out from under it.