What skills or knowledge from your coursework are you using in your internship? Have you noticed a difference between theory and practice? Why or why not?
Most of the skills from my digital humanities coursework that apply to my internship deal with metadata. How much metadata needs to be included? What is the best way to standardize that metadata? How will this metadata allow us to see patterns in the data?
The metadata of the “famous,” “celebrated,” and “celebrity” databases are very important. If all we cared about was the data itself, we would just copy the sentence in which the words are used. However, the database is meant to make information as clear as possible, so we list the newspaper that the word appears in, the date, and the location of the paper. That information will allow us to observe any regional patterns. (New Orleans newspapers really like sarsaparilla, for instance!) Similarly, we record the object to which the word “famous,” “celebrated,” or “celebrity” applies. (It’s surprisingly unclear in a lot of cases.) We also document the object’s sex (if there is one), foreign or domestic status, and age. And finally, most importantly: what is it/he/she famous for?
Perhaps all this information could be gleaned from the sentences themselves. However, while in theory that might be true (“The famous battle of Gettysburg took place three years ago today” is fairly obvious), in practice, many things are left implied (“The famous anniversary commenced with fireworks” makes a lot of assumptions about the previous sentences, but also about the reader). While I do wish that some of these newspaper writers had been more transparent in their meaning, it is fascinating to do a deep dive into who a famous “general” or “explorer” was based on the other clues in the article. It’s also made me aware of how much information is assumed to be known by readers in the newspaper articles of today; any future graduate students will have just as hard a time as I have been having!
We have all these rules that help us categorize these sentences and objects, but sometimes, they can’t cover all the bases. Once again, I should have anticipated this (rules can’t cover everything, after all), but I find myself surprised every time it happens. All this means is that we have to adapt our strategy and adopt a new rule. These rules don’t just apply to categorization; they also apply to how the data is written. I have my own style, but I’m also realizing that my boss has his own style (open brackets vs. closed brackets, for instance). Standardization is important, which is easy to remember when setting the rules, but harder when you are on your 300th data entry and cannot decide whether the “Elastic Lock Stitch Sewing Machine” is the same as the “Elastic Locke Stitch Sewing Machine.” Unfortunately, standardization is pretty hard when the English language wasn’t even standardized in the 19th century!