digioverload: To Ingest and Beyond!

This past Wednesday, Sam helped us ingest our very first records into DSpace. It was a beautiful moment. You can view them here.

Of course, the process didn't run perfectly--that would be against the law of this entire project! I remember Sarah warned us about how picky DSpace can be regarding the Dublin Core fields used to categorize the metadata for ingested items, and sure enough, we hit a Dublin Core snag: DSpace didn't recognize our country field, "dc:Country," as one of its official metadata fields. Not to worry--Sam came to our rescue with a handy Text Wrangler trick: a batch find-and-replace operation that allows you to delete a selection of specified text across multiple files. After deleting every mention of "dc:Country," we were finally able to process the batch. Because of other demands on the server space, we were only able to ingest six files into the Finances collection. Sam is currently hard at work on freeing up more server space, so we're hoping we'll be able to ingest a larger fraction of the 650+ files over which we've labored for so long.

Another thing we realized--perhaps we didn't use the right file naming convention. I remember Rachel wisely suggesting we use some combination of the title and date metadata fields as the file naming structure, but for some reason I thought it would be useful to use the identifier field instead (I think because that's the way things seem to work at the Briscoe). It's definitely a bit strange (and not very user-friendly) to see a list of files named "e_SAAUT_xxxx_xxx" rather than something obvious and understandable, like, say, "Meeting Minutes - 4.2.1995." Although it's too late to rename the individual files at this point (after all of that Perl script pre-processing and such), DSpace gives you the option to rename items, and I think we may want to do that--as labor-intensive as it might be.

Did I mention that this has been a major learning experience? With an emphasis on learning there.

A bit of good news: our copyright expert said we're probably not at risk of violating any FERPA regulations. I still want to check with the Briscoe before we make any or all of the items public.

Yep, this project has demanded persistence. But hey--archives are worth fighting for!

Poster for SAA-UT Archives Week, 1999

p.s. I'm sure batch ingesting is much more efficient than ingesting items one by one, but I'm sad that we still have to enter so many metadata fields by hand for each item we ingest--nearly all of the metadata fields that weren't uniform for every file (title, creator, description, etc.). I wonder if there's any way to ingest more metadata with the batch...gotta learn more about DSpace!

digioverload

Sunday, May 1, 2011

To Ingest and Beyond!

No comments:

Post a Comment

About Me

Followers