Thursday, January 22, 2009

Sheep DNA and Manuscripts (Again)

I meant to blog about this last week, but I had to get a lot of other stuff finished before the semester started.

Recently a research group at Johns Hopkins announced some results at getting DNA out of manuscripts. Jonathan Jarrett blogged about it here.

This isn't my crazy Sheep DNA project, which got started back in 2005 after Scott McLemee sent around a meme on Inside Higer Ed and I responded.

I may have spread the idea to the internet in 2005, but it was not mine, originally. Greg Rose came up with it in a conversation with me in the fall of 2001 (though of course he may have thought of it earlier). And other people seem to have had the idea independently. Supposedly (though I can no longer find the link), a group at Cambridge tried to sequence the DNA out of some manuscripts in the Parker Library.

In 2007 I was able to get some funding for the project, which is why I was able to get started and give a report here. With Prof. Barbara Brennessel (my co-author in the Anglo-Saxon medicine paper) and our student, Amanda Shorette, we started laying the foundation for the project. I taught Amanda paleography so that she could understand the ways that humanistic researchers work, and she researched and then taught me about DNA extraction and PCR (Polymerase Chain Reaction) amplification of DNA.

Amanda did a lot of very good research to show what we could and could not do with ovine DNA. Then this fall another student, Jay Korzun, was able to extract DNA from a fragment of manuscript. We didn't go to press or publicize this yet because we have been consistently worried about cross-contamination and are trying to rule that out, but it seems we are basically at the same place as the Hopkins group.

Here's the problem, which, word has it, the Cambridge group also ran into: when you have about 1000 years worth of touching and rubbing along the edges of books, you end up with a lot of DNA cross-contamination, not just from human DNA that has rubbed off (which you can rule out by using different primers), but from the various leaves bound together in the manuscript (for example, if someone touches leaf 42r and then touches leaf 45v, particles of DNA from one can be transferred to the other). Again, rumor has it that the Cambridge group concluded that you would have to cut out a few square centimeters out of the middle of each leaf in order to avoid contamination as much as possible. Librarians were not enthusiastic.

All these problems don't make the project impossible (maybe), just very difficult. I have been working on a patent application for a process/device to get samples in a non-destructive manner (i.e., get microscopic samples; there is no non-destructive way to get DNA out of a manuscript -- swabbing doesn't get you enough, we learned), but even if that works (and working on it as a patent isn't meant to keep anyone from using it; it's meant to keep anyone else from patenting it and then stopping me from using it), we will still have a very, very messy data set.

But, as I've learned from the lexomics project, messy data can be dealt with if there is enough of it. So the key is to get data from a huge number of manuscript leaves and try to weed out the cross-contamination. To that end, we've concluded that we're going to need to take a bioinformatic approach, and figure out as much of the ACTG etc. coding of the DNA we extract. Then all of those codes, as fragmentary or complete as they are, go into a massive database, to which other researchers can contribute. Eventually we will be able to start building some phylogenetic trees.

So, to those who've asked: no, I'm not upset at the Hopkins group, who almost certainly came to the idea independently. The only way this will ever give us interesting results (i.e., Fred the Sheep gave his life for the Beowulf manuscript and look, part of Fred is in the Blickling homily manuscript also; or Fred's cousin Violet is in this Malmesbury charter...) is if a very large number of researchers and groups gather data over a long period of time, so there more people working, the better.


Steve Muhlberger said...

Get ready for CSI: Wheaton and CSI:JH!!

John Cowan said...

If you publish, no one else can patent.