Sunday, March 20, 2011

Magnificent Metadata

Recently, my digitization life has been ruled by metadata. I suppose most of my work with the Bexar Archives has revolved around metadata from the start, but only in the past few weeks--in class and at work--have I begun to glimpse the absolute power of this data about data...and to fear it.

In his lecture at the iSchool a couple of weeks ago, Professor Howard Besser spoke about the crucial role of recording metadata throughout the life of a project--specifically in reference to his preserving digital public television project. During our class photo digitization work, we attempted to do just that. Populate a few fields in a Google doc spreadsheet with some basic info about a handful of photographs--piece of cake. Wait, these photographs document the Texas Oil Industry? Um, what's that right there in the photo? A drill? A rig? A derrick? Come to think of it, what's a derrick, anyway? Frantic google searches for oil industry imagery provided some assistance in my metadata quest, but it was definitely a challenge to try to describe something outside the scope of my knowledge--particularly visual materials, for which dictionary and authority searches are tricky to perform. When will image-recognition technology enable us to start performing image-based Google searches?? I'm ready.
Okay, so I'm pretty sure those are cowboys in silhouette, but what on earth do you call that oil thingy in the background? (Image courtesy of http://www.austinchronicle.com/arts/2009-06-12/792744/)
And then there was the Bexar Archives Y2K moment. So, the Bexar Archives consist of thousand and thousands of documents, but they're being digitized slowly over time. I'm taking part in what is only the second phase of the digitization effort, and we're slowly realizing ways to streamline the project workflow. One problem: the identifiers in the schema from the first phase limit documents to 99 pages in length...but there are lots of 100-plus page docs in the collection. The identifiers also cap the total quantity of digitized documents at 9999. Begin freak-out-moment. Take deep breath. Develop solution: the savvy webmaster will update the database code to allow for documents with three-digit page numbers and a collection maxed at 999999 docs. Hooray! Enter additional solution: a handy free download called "File List," which is empowering me to add digits to thousands of existing identifiers with one click.

I suppose these encounters offer a few lessons: keep your metadata as you go, read up on your subject in order to ensure accurate metadata, think ahead about what you'll need from your schema as the project grows, and, in general and most importantly...never doubt the magnificence of metadata.