Access to Data from Pacific Voyages Fosters Research and Burnishes Smithsonian's Image
The U.S. Exploring Expedition of the Pacific, led by Captain Charles Wilkes from 1838 to 1842, produced a veritable ocean of data. The leading scientists and artists of the day sailed on a mission to collect, preserve and document anything of value to natural historians throughout the Pacific Ocean. They logged volumes of notes and drawings, collecting nearly 2,400 anthropological artifacts and 50,000 plant specimens.
Crisscrossing the Pacific, Wilkes's expedition established that Antarctica is a continent, mapped South America's coast and the Columbia River basin, charted several Pacific Island groups and researched Hawaii volcanoes. The accuracy of the maps helped guide U.S. forces in the Pacific to victory during World War II.
Although the expedition has been largely forgotten, the volume of data was staggering – five volumes of narrative descriptions, 15 volumes of published scientific and anthropological documents, plus four additional volumes that had never been published. In all, the Smithsonian received 1,600 pieces in 1858. Now, the Smithsonian wanted to make these 160-year-old records of flora, fauna, geography and meteorology available to modern researchers through its Galaxy of Knowledge portal.
Because much of the vast collection required labor-intensive transcription and document linking, the Smithsonian knew that it needed to partner with an offshore content services provider to create the digital archive. Moreover, the documents needed to be converted with a high level of accuracy to ensure the material’s usefulness to scholars.
From that perspective, the Smithsonian’s decision to partner with Innodata Isogen, a leader in digitizing content, was a logical choice.
Each page of the printed volumes was scanned with optical character recognition software and the text files were checked against the original by the Innodata Isogen team to ensure complete accuracy of the conversion. The text files were then converted into accessible XML data files. Data elements within the pages were tagged and coded to allow the information to be matched to the document type definition (DTD) system that the Smithsonian has established for its on-line content.
Photos of more than 2,000 artifacts and hundreds of pages of drawings and illustration plates were digitized and tagged and coded using Smithsonian’s DTD system to allow the entire collection to be searched with key words. Throughout the process, Smithsonian scholars checked each page and illustration for accuracy. They developed descriptions of the sailing vessels and the 600-plus crew of sailors and scientists from the collection and other resources.
When the site was launched in early 2004, visitors to the site could read an overview of the expedition and then choose whether to further explore the narrative texts, scientific texts, plates or supplemental material and resources. Narrative and scientific texts and plates can be viewed as JPG files or as printable PDF pages. In addition, the supplemental resources section contains photos of more than 2,000 artifacts, powered by a search engine that enables researchers to review the entire collection through the use of key words. For example, a herpetologist can compare the salamanders of South America and Samoa as easily as a geologist can study the differences of rock strata from the Columbia River with those of Antarctica.
When completed after eight weeks, the project put crumbling yellow pages once off-limits to all but dedicated scholars just a mouse-click away to all researchers via computer. Each month, more than three million people visit the Smithsonian’s Galaxy of Knowledge portal, giving this oft-forgotten expedition the public spotlight it deserves.
Scientists can now compare 160-year-old descriptions with current data to identify changes in the fl ora, fauna, geology and meteorology throughout the Pacific. Digitizing the entire collection also ensures its preservation and establishes protocols that will help other Smithsonian archivists to create similar virtual museums of its collection, which can now be cross referenced to facilitate multi-disciplinary research.
Moreover, the steady stream of online visitors who visit the site to explore the expedition’s discoveries furthers the Smithsonian’s reputation as a leader in the effort to digitize historical records.