Innodata Spearheads Award-Winning Digitization Project

Library Archive Project Recognized for Innovative Use of Digital Technology


University of Virginia librarians reviewed hundreds of century-old writings while creating an exhibit a few years ago. The archive documented the Yellow Fever project, headed by Major Walter Reed, MD, in Cuba at the turn of the 20th century. The project was considered significant for proving that the disease was borne by mosquitoes and because it set the precedent for requiring consent forms for medical test volunteers.

The university previously received 147 boxes of materials on the Yellow Fever project as a bequest. Supplemented with other university-owned papers and Library of Congress content, the bequest materials represented the core of the Yellow Fever exhibit that opened to the public in 1997. The collection’s 13,007 pages of handwritten letters and envelopes, news clippings and books with handwritten notes scribbled in the margins were crumbling and fading.

The library needed to digitally preserve this important archive, including the transcription of all handwritten material, so it would be easily available to scholars online.


The library quickly determined that XML language with TEI attributes would be the best platform, since it would allow for richer, more flexible searches and data presentation. This platform would also interface seamlessly with the library’s Electronic Text Center collections.

The sheer volume of handwritten material in need of transcription called for offshore translators, so that the library’s budget would not be exceeded. The library chose Innodata Isogen, a proven leader in XML technology. Innodata Isogen dedicated a unit to cost-effectively transcribe the material.


The library first scanned the material in Tagged Image File Format (TIFF) at 600 dpi and then burned the files as JPEGs onto CDs. Approximately 2,500 pages were mailed to the Innodata Isogen team every two weeks, and transcriptions returned via electronic transmission at a rate of 313 pages per week. All items, even letter envelopes, were scanned and cataloged. Innodata Isogen helped the library design a template or matrix for the pages, and assigned a meta-data header and a brief summary for each document. The matrix established 20 general keyword terms, such as names and Library of Congress headings. These terms were then collected into a master list to be reviewed and updated by the library as the work progressed.

Innodata Isogen's staff efficiently transcribed all of the handwritten materials, including items that were almost illegible. Each transcription was matched carefully against the original to differentiate preserved spellings from transcription errors. The material was divided into two categories:

  • The Collection - a complete database of images, documents and transcriptions
  • The Story - a narrative linked to the relevant original materials

While Innodata Isogen's content services team worked on the transcriptions, the library staff developed a website promoting “The Story.” A form and coding system was established to organize the pages and their accompanying transcriptions into “The Collection” site. The two sites were ultimately linked to allow seamless navigation between the story and the supporting artifacts.


Since its grand opening a few years ago, the site has proven to be a tremendous success. The University of Virginia set records for the scope of material organized using XML technology. The collection garnered the National Archivists Award from the Society of American Archivists for its innovative use of digital technology to advance the historical record. Modern scholars worldwide can now analyze the original documents to research the important fight against Yellow Fever.

The collection also established a framework for creating digital archives on other topics, such as HIV, West Nile fever and the Ebola virus. The successful completion of the archive puts the University of Virginia in a better position to seek additional grants to document complex research topics.



We are using cookies to give you the best experience on our site. Storage of third-party cookies may be adjusted through your browser setting.

By continuing to use our website without changing the settings, you are agreeing to our use of cookies.

More Information