Episode 3 - Standardization and Control
Hey everyone, thanks for tuning in. Today I want to talk about some standards. I am going to give an overview of what they are in terms of information, make a useful analogy for why they’re so important, and talk about a few specific ones that you may be familiar with. In future I think it will be useful to do a deep dive into certain standards and what they do and who and what they’re designed for, but today is more of an overview.
The standardization of information is the essence of metadata. You standardize things so that no one needs to think about it, it just works. My favorite example is the electrical socket in your home. It came with our home. It is a standard shape, size and voltage. You don’t have to think about what appliances you can plug into it, you can buy any lamp, any blender, any television and it fits. Standards were put in place years ago to make sure you didn’t have to worry about this every time you bought a new electronic. Imagine if you did! What a mess. You’d be switching out sockets to replace the one for your old lamp so it works with your new lamp, or piling up a mountain of converters. And, just to further my example, have you ever traveled overseas? BOOM – your plug is no longer standard.
Metadata standards are working behind the scenes to do the same thing for information. It is why you can do a Google search and get great results (er, that and lots of algorithms and other technical thingies, but I don’t have scope for that!). But really, metadata standards are doing for digital information what the standard outlet has done for home electronics, made it so you don’t have to think about it, it just works. And there are types of metadata standard as well.
First, there are standards for the structure of the metadata. These are known as schemas. Schemas dictate what fields will be captured for a given metadata set, the structure of the set. So a metadata field might be called an attribute, or an element, but it’s the name of the thing. Where you fill out a form, the field is name before the text box or the drop down list. And fields have values. The value is what would be filled into the text box, or the item you would select in the dropdown list. I do tend to use the terms metadata fields and values much much more than “attributes” or “elements,” but don’t be confused where you see those. There are numerous publicly available industry and cross-industry schemas.
Then there are content standards. The content standard dictates how the data will be entered into the system. So think about when you are filling out a form online and almost every form asks for your name in two parts: First and Last. This is because machines can’t really tell the difference otherwise and if you have three individual words in your name, it won’t know if you have two first names or two last names unless you fill them into the correct form box. But the content standard covers so much more than just that, like whether a value must be a string, or a date, or a number, and how many characters can be included in the string, and so much more. One example of a content standard that you can look up because it is freely available online is Describing Archives: A Content Standard, to see how they standardized the description of archival material.
There are value standards, which is another way of saying a controlled vocabulary. The controlled vocabulary is the set of terms that a person can select from in order to apply a tag to a piece of content. This practice is ubiquitous and critical in the quest for data quality. It is the reason why, when you’re filling out your address online, and you get to “State or Province,” you have a dropdown list instead of a text box to type in. This allows the company for whom you are filling out the form to make sure that you don’t make any errors in entering the state data (Assuming you select the correct one!). It will always be spelled correctly and in a consistent format – either full name or abbreviated.
And lastly are format standards. These guys are all around you, hiding behind the scenes and doing so much heavy lifting to help you not have to think. These are technical specifications for how to encode metadata so that the machine can read it, export it, share it with another system. These “encoding standards” or “data formats” create that “interoperability” that is the great promise of aligned metadata. XML is a common encoding standard that specifically helps transport data from machine to machine in a way that more software can understand.
Where can you get your hands on some sweet, sweet standards? Boy, do I have some answers for you. But you may or may not like them, depending on your industry. The fact is that many of the refined existing standards were born out of libraries, naturally, as well as the arts and sciences – two fields with a vested interest in standard ways of describing and sharing information. So a ton of the standards that you’re going to find out there may not be super useful to you if you’re looking to create a metadata schema that is going to make your digital content findable internally, and externally, and then provide consumer insights that will then have guide product development… But I don’t want to discount the value of the existing standards out there! Existing standards are your foundation, so I am going to review some of them.
There are some general schemas that pretty much anyone can use to describe anything. For instance the Dublin Core Metadata Element Set is an industry standard that came out of an exercise of trying to define the minimum required set of fields needed to provide useful context about a resource, that would be applicable to the widest range of resource types. Right? So it’s trying to provide the minimum viable number of fields to encompass the widest range of resource types. Y’all, this is one generic metadata schema. But it works. It is a widely used schema for web content. I like Dublin Core because it is so simple and really does include all the most critical elements. Now if it really did fit all purposes of course then everyone would just use it and that would be so easy, but some content does need a bit more nuanced description.
Dublin Core is one of a host of generic metadata schemas, or element sets, if you will, that are freely available to you and created by consortia of information management professionals. There is also Encoded Archival Description (EAD), Metadata Object Description Schema (MODS) and VRA Core which is for visual materials. And you’ll find a lot of similarity between these three largely general schemas.
And sometimes, for instance if you’re a cultural institution that needs to share content with museums and libraries and archives, then you might need to be able to put your metadata into all four of the above schema examples! Now what do you do? Enter The Crosswalk. So a metadata crosswalk matches field names across different schemas to their closest counterpart. So, in my above examples:
Dublin Core has the field Title.
MODS has the field Title Information.
EAD has Title.
and VRA Core has Title.
These four are fairly easy to map to each other, three of them are identical and one is only slightly different, arguably a more robust option. A slightly stickier example between these four might be that
VRA Core has this field called “Cultural Context” and the field is defined as the “Name of the culture, people or country with which the work has been associated.” None of the other schemas really capture this idea. Remember VRA Core is about visual resources. If you just have an image, you really need that cultural context. None of the other three schemas really have that.
MODS has Genre, but that’s defined as “a term that gives more specificity for the form, style, or content of an object.” That’s not quite it. It has the field “Origin Info” but that field “Contains subelements related to place of origin or publication, publisher/originator, and dates associated with the resource.” If you stretch the first part of the definition this might be where you would map “Cultural Context” to MODS.
Dublin Core really doesn’t cover this one. That concept of cultural context is wayyyyy to specific for Dublin Core. You might just have to add something into the description if it isn’t captured by the subject.
EAD is interesting. There is a ton of care taken in archival description to capture provenance, with the purpose of preserving archival context. Respect des Fonds, Provenance and Original Order are a whole rabbit hole of archival practice that I won’t go down today, but let me know if I should go down it some other time. But despite having a whole different scope from VRA Core’s “Cultural Context,” I think it is trying to accomplish something similar.
I can easily argue that each of these deserves its own deep dive episode to give a lot more information about how they’re made, how they’re used, what you can do with them. I want to cover all of that and so much more! I’ll let you know where to send those suggestions in a moment, but before I do, I want to give reading recommendations and cite my sources.
I’m sticking with introductory content today because we still covered some pretty basic content.
My read for today is: Understanding Metadata: What Is Metadata, And What Is It For? By Jenn Riley, published and available for free online from NISO – The National Information Standard Organization. This document introduces you to a lot of NISO’s industry standards for information standardization, and their efforts are going a long way to create standard digital languages in and across industries.
My sources today included:
Firstly, I did use the University of Texas Libraries “Metadata Basics” web content again. It’s good stuff and laid out very clearly. Seriously go check it out. The lists of standards is extensive.
I got some good details from my reading recommendation for today, and again, this document is freely available online, I’ll post a link on the website under this episode.
I got some details from Describing Archives: A Content Standard, also known as DACS, which I would love to go into more detail about one of these days.