Sunday, May 1, 2011

To Ingest and Beyond!

This past Wednesday, Sam helped us ingest our very first records into DSpace. It was a beautiful moment. You can view them here.

Of course, the process didn't run perfectly--that would be against the law of this entire project! I remember Sarah warned us about how picky DSpace can be regarding the Dublin Core fields used to categorize the metadata for ingested items, and sure enough, we hit a Dublin Core snag: DSpace didn't recognize our country field, "dc:Country," as one of its official metadata fields. Not to worry--Sam came to our rescue with a handy Text Wrangler trick: a batch find-and-replace operation that allows you to delete a selection of specified text across multiple files. After deleting every mention of "dc:Country," we were finally able to process the batch. Because of other demands on the server space, we were only able to ingest six files into the Finances collection. Sam is currently hard at work on freeing up more server space, so we're hoping we'll be able to ingest a larger fraction of the 650+ files over which we've labored for so long.

Another thing we realized--perhaps we didn't use the right file naming convention. I remember Rachel wisely suggesting we use some combination of the title and date metadata fields as the file naming structure, but for some reason I thought it would be useful to use the identifier field instead (I think because that's the way things seem to work at the Briscoe). It's definitely a bit strange (and not very user-friendly) to see a list of files named "e_SAAUT_xxxx_xxx" rather than something obvious and understandable, like, say, "Meeting Minutes - 4.2.1995." Although it's too late to rename the individual files at this point (after all of that Perl script pre-processing and such), DSpace gives you the option to rename items, and I think we may want to do that--as labor-intensive as it might be.

Did I mention that this has been a major learning experience? With an emphasis on learning there.

A bit of good news: our copyright expert said we're probably not at risk of violating any FERPA regulations. I still want to check with the Briscoe before we make any or all of the items public.

Yep, this project has demanded persistence. But hey--archives are worth fighting for!

Poster for SAA-UT Archives Week, 1999

p.s. I'm sure batch ingesting is much more efficient than ingesting items one by one, but I'm sad that we still have to enter so many metadata fields by hand for each item we ingest--nearly all of the metadata fields that weren't uniform for every file (title, creator, description, etc.). I wonder if there's any way to ingest more metadata with the batch...gotta learn more about DSpace!

Saturday, April 23, 2011


Greetings from the borderlands of DSpace, where Rachel and I have been floating for the past week. Wish you were here!

After successfully digitizing dozens of documents, we turned eagerly to the "ingest" process--a.k.a. uploading digitized materials to their new forever home on the iSchool's digital repository. We gathered expert advice from Dr. Galloway about our organizational schema and consulted with students in her Problems in the Permanent Retention of Digital Records class about essential DSpace resources. We read through the DSpace manual and supporting documentation. We crossed our fingers and took deep breaths and practiced eating space food.

But we were still confused.

We decided we needed to see the ingest process in action before attempting it ourselves, so we talked with Sam, who suggested we talk with Sarah, who blessed us with several magical batch processing powers: the New Zealand Metadata Extraction tool and three marvelous perl scripts (which she originally received during her Problems project at the HRC), passed down through the DSpace generations like some elusive passcode.
Can we give Sarah a gold star for staying late with us in the IT lab to walk us through our first ingest pre-processing encounter? Despite the fact that she has this little dissertation thing to work on...
Once Sarah left, though, our luck ran out: we couldn't seem to launch the metadata extractor on the lab computers (something about missing Javascript), so we did what any NAS-space-hogging SOD student would do: we uploaded EVERYTHING to the network and prayed we'd be able to open the Extractor at home.

Fortunately, our dreams came true. The extractor launched beautifully on my laptop, and a few hurdles, several questions, one ActivePerl download--and about 8 hours of troubleshooting--later, all of our files are processed and waiting patiently on the NAS for the next chapter in their journey: ingest. We're hoping Sam (bless him!) will be available to walk us through these steps in the coming few days.
In the meantime, we're going to build the DSpace hierarchy according to our schema and get started on our reflection paper.

Updates to come, but for now, I'm a bit exhausted, so I'm going to pop a TUM for my in(di)gestion and try to get some DSleep.

p.s. We originally thought our hierarchy would look like this:

Sub-community: YEAR
Sub-sub community: Archives Week
Collection: YEAR Archives Week
Sub-sub community: Meeting Minutes
Collection: YEAR Meeting Minutes
Sub-sub community: Events
Collection: YEAR Events
Sub-sub community: Finances
Collection: YEAR Finances
Sub-sub community: Marketing
Collection: YEAR Marketing
Sub-sub community: Correspondence
Collection: YEAR Correspondence
Sub-sub community: Administrative Records
Collection: YEAR Administrative Records (includes Annual Reports?)

Then we decided it would look like this:

Sub-sub community: Events
Collection: YEAR
Collection: YEAR
Collection: YEAR...etc.
Sub-sub community: Finances
Collection: YEAR
Collection: YEAR
Collection: YEAR...etc.
And now we've decided it will look like this:
Sub community: SAA-UT Records
Collection: Administrative Records (including annual reports)
Collection: Meeting Minutes
Collection: Archives Week
Collection: Events
Collection: Finances
Collection: Marketing
Collection: Website
(with the years conveyed through the metadata)
Sub-sub community: Finances
Collection: YEAR
Collection: YEAR
Collection: YEAR...etc.

Questions to be resolved:
-which materials should have access restrictions, based on the Briscoe's and iSchool's policies, and the FERPA regulations?

Thursday, April 14, 2011


Rachel and I are approaching 200 digitized files, and I'm officially feeling the digioverload. After hours of OCRing strange fonts and google doc-ing dublin core, we must be at e_SAAUT_1000000000000000000000.tif by now, right?
SAA-UT digi in action

Although the process is time consuming, indeed, I'm proud of our progress: we've successfully digitized:
  • Records of the Chapter's official formation and recognition by national SAA organization
  • Meeting Minutes
  • Events records
  • Correspondence (some of it, anyway--SAA-UT-ers printed so many of their e-mails in 1994, and they were always busy corresponding with other chapters and invited speakers)
  • Annual Reports
  • Photos (including classic Dr. Gracy shots!)
  • Angelina Eberly poster
Along the way, Rachel and I are learning more than we expected about the past activities of our energetic chapter. It's a treat to recognize names from the leadership of so many Austin repositories in these records and to take pride in our Chapter's legacy of connection with the broader SAA community. We were talking about how fun it would be to invite some of the stars of these records to visit us and share their memories--and to record their oral histories for the archives, of course! I'm hoping that providing access to these records will rekindle that original spark of magic and archival enthusiasm John Slate and Sharla Richards must have felt when they received word from SAA that our Chapter had been officially recognized.

THE BOX, courtesy of the Briscoe
After drafting an organizational schema, Rach and I are hoping to transition to the next phase of our project this weekend: DSpace. Ahhhhh! We've consulted with Dr. Galloway several times, and we're seeking advice from students and fellow SAA-UT-ers who are currently in her "Problems" class, so I'm hoping we can figure this out. Here's how our schema is looking:

Original (dis)order?
Sub-community: YEAR
Sub-sub community: Annual Reports
Collection: YEAR Annual Reports
Sub-sub community: Archives Week   
Collection: YEAR Archives Week
Sub-sub community: Meeting Minutes
Collection: YEAR Meeting Minutes
Sub-sub community: Events
Collection: YEAR Events
Sub-sub community: Finances
Collection: YEAR Finances
Sub-sub community: Marketing
Collection: YEAR Marketing
Sub-sub community: Correspondence
Collection: YEAR Correspondence
Sub-sub community: Administrative Records
Collection: YEAR Administrative Records
Sub-sub community: Website
Collection: YEAR Website

The multifaceted and flexible search functionality of DSpace should enable users to choose multiple points of entry into the collection--whether they want to search by year, by document type, by person, etc.

After we've determined how to establish this organizational hierarchy within DSpace, we'll need to do a "batch ingest" of our digitized records and figure out how to upload the corresponding metadata. Deep breath.

High five, Rach! 10, 9, 8, 7, 6, 5, 4, 3, 2, 1...blast off to DSpace!

we're digi-nauts!

Sunday, April 3, 2011

Archiving Archivists' Archives

If you can think of a way I could slip another variant of the term "archives" in that title, let me know.

Ticket for Dr. Gracy's famous finger-lickin' potluck, 1994
If any graduate student organization should have total control over its recorded history, it would be the Society of American Archivists student chapter, right? Especially the student chapter for the number one archives program in the U.S.?

Well, as many professional archivists will attest, on-the-job archival expertise don't always translate to diligent care of one's personal records. Scattered among the Briscoe Center for American History's unprocessed collections, the disheveled storage room on the fifth floor of the iSchool, and former board members' hard drives, the SAA-UT records are in need of some love.

Rachel and Wendy save the day! For our final project, Rachel and I are creating a digital repository for the SAA-UT archives. Over eighteen years of archival activities, SAA-UT has accumulated boxes and drawers full of paper and photos--both hard copy and born-digital--so this is going to be an overwhelming task. Fortunately, once we've established a work flow, I will be able to pick up where we leave off in my role as SAA-UT Archivist. Here's how we envisioned the entire project in the beginning:

Digitization for Preservation: Digitization of hard-copy SAA-UT Materials from Dr. Gracy and other chapter sources:
-digitize materials in various formats from Dr. Gracy's SAA-UT records, which he donated to the Briscoe several years ago, and which they haven't yet processed: these materials occupy a large records box and include typewritten documents, graphics (posters for Archives Week, etc.), photos, possibly artifacts (such as designed t-shirts)
-digitization of any other relevant materials from past chapter members (materials in our storage cabinets and files that have been handed down from various boards through the years)
-OCR-ing where appropriate
-creation of necessary metadata

Digital Preservation: Work with born-digital records:

-create preservation-quality versions of born-digital records (meeting minutes, important emails, budget documents, digital photos/audio/video, SAA-UT Facebook page and Twitter feed, SAA-UT website), OCR-ing where appropriate
-creation of necessary metadata

Creation of Digital Repository on DSpace:

-store digitized and preserved born-digital materials on the iSchool's DSpace system
-work with SAA-UT Webmaster to establish records schedule to ensure that all new records are added to both the website and DSpace while active, and then retired to DSpace when no longer active (accessible through "archive" link on website)

Creation of Finding Aid for SAA-UT Digital Records:

-write a finding aid for the materials in our digital repository, which can be expanded as the chapter moves forward
-encode the finding aid using EAD
-make the finding aid searchable

For the purposes of this project, Rachel and I have decided to narrow our focus to digitizing records from SAA-UT's first three years (1993-1996) in the following categories:
  • Records of Chapter's official formation and recognition by national SAA organization
  • Original Chapter Constitution
  • Meeting Minutes
  • Events
  • Correspondence
  • Archives Week
  • Budgets

We also hope to digitize at least a few photos, and audio or video (and possibly some version of our website/social media presence in order to experiment with a variety of file types.

We will be scanning and OCR-ing the paper documents, creating files in the following formats:
  • TIFF (archival master)
  • JPG
  • PDF
  • TXT

Because the Briscoe Center will serve as the official repository of the SAA-UT archives, we will describe each digitized record using a custom metadata schema, which we've constructed from the Briscoe's Dublin Core-based digital metadata guidelines.

Once we've digitized the materials and described them with metadata, we will work with Professor Galloway to ingest the materials into an SAA-UT digital repository on DSpace, where they will be preserved for the foreseeable future and available for download or access by future SAA-UT members. This digital repository will also serve as a home for our born-digital records. We probably won't have time to complete any sort of finding aid/guide to the SAA-UT digital collections, but I will tackle this in the summer and later this year.

If we can accomplish our goal of establishing a work flow for ingesting materials into the SAA-UT DSpace repository, we will have provided the activation energy for the continuing digital stewardship of SAA-UT's archives.

Check out Rachel's reflections on our master plan here:

Sunday, March 20, 2011

Magnificent Metadata

Recently, my digitization life has been ruled by metadata. I suppose most of my work with the Bexar Archives has revolved around metadata from the start, but only in the past few weeks--in class and at work--have I begun to glimpse the absolute power of this data about data...and to fear it.

In his lecture at the iSchool a couple of weeks ago, Professor Howard Besser spoke about the crucial role of recording metadata throughout the life of a project--specifically in reference to his preserving digital public television project. During our class photo digitization work, we attempted to do just that. Populate a few fields in a Google doc spreadsheet with some basic info about a handful of photographs--piece of cake. Wait, these photographs document the Texas Oil Industry? Um, what's that right there in the photo? A drill? A rig? A derrick? Come to think of it, what's a derrick, anyway? Frantic google searches for oil industry imagery provided some assistance in my metadata quest, but it was definitely a challenge to try to describe something outside the scope of my knowledge--particularly visual materials, for which dictionary and authority searches are tricky to perform. When will image-recognition technology enable us to start performing image-based Google searches?? I'm ready.
Okay, so I'm pretty sure those are cowboys in silhouette, but what on earth do you call that oil thingy in the background? (Image courtesy of
And then there was the Bexar Archives Y2K moment. So, the Bexar Archives consist of thousand and thousands of documents, but they're being digitized slowly over time. I'm taking part in what is only the second phase of the digitization effort, and we're slowly realizing ways to streamline the project workflow. One problem: the identifiers in the schema from the first phase limit documents to 99 pages in length...but there are lots of 100-plus page docs in the collection. The identifiers also cap the total quantity of digitized documents at 9999. Begin freak-out-moment. Take deep breath. Develop solution: the savvy webmaster will update the database code to allow for documents with three-digit page numbers and a collection maxed at 999999 docs. Hooray! Enter additional solution: a handy free download called "File List," which is empowering me to add digits to thousands of existing identifiers with one click.

I suppose these encounters offer a few lessons: keep your metadata as you go, read up on your subject in order to ensure accurate metadata, think ahead about what you'll need from your schema as the project grows, and, in general and most importantly...never doubt the magnificence of metadata.

Wednesday, February 16, 2011


In-between battles with ABBY FineReader's spell checker and debates about copyright in the digitization sphere, I've been musing about another area of digicreativity that sparks my imagination: the digitization of 3-D objects. The whole challenge of representing a 3-D object in two dimensions--and using digital technologies to somehow make the viewer's experience of the digitized object even richer (or rich in different ways) than the viewer's experience of the original object--strikes me as a delicious challenge. But also an insanely expensive one, and maybe, depending on the object, an unnecessary one. But, if done right, could the digitization of 3-D objects enhance access in a genuinely satisfying way? In digitization scenarios, how can we mimic tactile sensation by enabling users to explore texture and dimension in unprecedented ways? So much of the museum experience is about the do not touch. Could we break this barrier through digitization?

One of Khrushchev's treasures in the Kennedy Digital Archives.

The John F. Kennedy Presidential Library & Museum's Digital Archives offers one model of 3-D object digitization. Click here to zoom in on the 19th-century drinking horn that Nikita Krushchev gifted to Kennedy (hmm...what was the political backstory here? that seems to be missing from the metadata). The close-up visual tour is aesthetically pleasing, but does it take you beyond the experience you'd get when peering through a glass case at the Museum?

According to this blog post, the digital folks at the Smithsonian Institution are actively pursuing 3-D digitization plans for their collections. The idea of using digitization and 3-D imaging to enlarge an object visitors have trouble seeing with the naked eye seems promising and particularly useful, but I'm not sure about this "Infamous Blue Beetle," who has crawled to a new digital home on Facebook. The concurrent rise of the digitization of 3-D objects with the soaring popularity of born-digital 3-D modeling and 3-D printing may bode well for the affordability of such digitization efforts in the near future.

The Smithsonian's National Museum of Natural History manages to approach the concept of "touch" in their 3D Collection: by holding down the left-click button on the mouse, you can "touch" and rotate the objects to make them mirror the movement of your cursor--even this little 27,000-year-old Fired Clay Bison, unearthed in the Czech Republic. Being able to manipulate the object with your finger would take the "touch" experience even further: I wonder if these images are viewable on a touch-sensitive tablet computer. Has anybody made a Please Do Not Touch! app for iPad?

When I stumble upon further examples of digitization projects involving 3-D objects, I'll add them here.

Monday, February 7, 2011

Digital Repository Case Study: Google's Art Project

A neat example of new innovations in digital museum exhibits--"United Nations-like," as the NYT reporter calls it, referring to the way you can "hop" from the Palace of Versailles to MoMA to the Uffizi Gallery, and back again, with a few clicks.

The Work of Art in the Age of Google

Interesting how Google was able to convince several major institutions to collaborate on this project. And also interesting: the copyright obstacles that prevent institutions from including recent works in the experiment.

Super-zoom, van Gogh's "The Starry Night"