©2019 by metashop

Episode 9 - Ode to the Z39.19 Part 1

OK, hello and welcome. Thank you for joining the podcast all about metadata. Today is a very special day. I am going to talk about one of my favorite documents, giant monster documents, that has served me throughout my career in taxonomy and metadata development. If you’re ever in the weeds of developing a controlled vocabulary and you’re suddenly confounded by a homograph, an adverb in your vocabulary, how to appropriately use an abbreviation, capitalization, non-alphabetic characters, really, the list that this document covers goes on. This is my favorite reference document and a serious favorite to hand to someone who is new to controlled vocabularies, both for the value factor, and the shock factor. This is exciting.

The Guidelines for the Construction, Format and Management of Monolingual Controlled Vocabularies is a standard published by NISO, the National Information Standards Organization and ANSI, the American National Standards Institute. First published in 1974, the standard is on its fourth edition with the ANSI/NISO Z39.19-2001 (R2010). Got that? This standard, as you can imagine, was originally conceived when thesaurus terms were generally applied when indexing collections of printed documents like journals, reports and newspapers. But the times have changed and the standard has changed to reflect those changes, with the concept of “documents” to be indexed expanding to include other media such as maps, music and videos, and the concepts covered by the standard expanding to include topics of digital media including taxonomies, schemas, interoperability and formats for data exchange. This is going to a two-parter because in writing this I only got about halfway through the document before I was out of time, so this first episode of two will focus on the construction of a vocabulary side of things, and the second episode will go over, well the rest.

Importantly, you can find this 11-section, 184-page behemoth freely available online if you Google the title. And while I would not advocate that anyone sit down by the fireside to read the whole thing, although you certainly can if it’s a particularly cold winter’s night, but I do think that everyone should keep a copy of this document on-hand as a reference resource for when those questions of best practice come up for how to construct a vocabulary properly. The Z39 covers it all.

Within the first few sections is a glossary of terms used within the document itself. Classic librarian habits, always starting by defining important details. Brilliant. I also love how it defines how vocabulary control is achieved.

  1. By defining the scope, or meaning of terms

  2. By using the equivalence relationship to link synonymous and nearly synonymous terms, and

  3. By Distinguishing among homographs
     

Badda-bing. It feels so simple when you boil it down to those three “principal methods,” doesn’t it? Haha, it’s not easy. 

The standards also starts by declaring that it covers the selection, formulation, organization and display of terms that together make up a controlled vocabulary. It does not, however, suggest procedures for organizing or displaying subject headings for mathematical or chemical formulas, not for establishing proper names, nor for creating authority files. So just an FYI on that.

But it explains what a term is, (in the context of this document), what a content object is, what indexing is, and it details in the guiding principles of vocabulary control, specifically

  • Eliminating ambiguity

  • Controlling synonyms

  • Establishing relationships among terms where appropriate and

  • Testing and validating terms
     

And it uses the term Mercury several times to explain its meaning, because Mercury could variously be used to describe an automobile, a planet, a metal or an ancient god. So your controlled vocabulary helps to disambiguate use of the term across intentions. And it also explains synonyms although its example for this is just so much less approachable in my opinion, illustrating that Artificial Consciousness, Biocomputers, Electronic brains, Mechanical brains and Synthetic consciousness are all synonyms for conscious automata.

It really does a great job of introducing you to a lot of concepts or information control and organization. It describes facets in vocabulary development, it details the differences and relationship between lists, synonym rings, taxonomy and thesauri, and describes metadata and metadata schemas.

From there is goes on to detail the principles of term selection and determining the correct form a term to use, starting out with a wonderful outline of some of the (implicit?) common sense of what should drive how a vocabulary works, like the domain to which the vocabulary will be applied, the level of expertise of the audience it will face, how granular or general it needs to be, and things like that.

And it explains term forms, just detailing what it means when you single-word or multi-word terms, and use of capitalization. It also has a whole section on whether to pluralize nouns. I highly recommend reading it. In short, you pluralize any not-proper noun (and singularize a proper noun), unless the domain for which you’re building the vocabulary typically uses the terms in the singular. For example, you might say “books” or “Singers,” but if it is a list of art objects in a museum catalog, it might be “chair” and “oil painting.”

But it also describes grammatical forms of terms such as how the terms should always be a noun or noun phrase, giving examples of how you create verbal nouns. It also has entire sections around the use of adjectives, adverbs, and initial articles in vocabulary terms, laying clear rules about whether they can be their own term or not. And of course it dedicates a whole section to the use of capital letters and non-alphabetic characters in terms. So sometimes you’ll use all caps, or even just partial caps in a term if it is a trade name, same with use of spaces in a trade name, sometimes you won’t have any.

 

And then there are non-alphabetic characters which I must warn are always dangerous in an information system. You never know when a comma or a slash is used as a delimiter by your index so having it in a term will throw off your search. But aside from that risk, the document overviews the use of apostrophes (if your term is in possessive case, ie. Holidays > President’s Day, or if you have diacritics, dashes, ampersands, or slashes, all possible in trade names or proper nouns.

 

So all that coverage of the construction of form of the term is what really got me excited about this document to begin with over a decade ago. But that’s hardly the scope of the document. It then has a whole section on how to select the preferred form of the term for your audience. Going back to the, sort of common sense notion of how do you build the vocabulary, well for your audience, it reviews some of the philosophical drivers for the form of the term, like common usage, literary/ organizational / or user warrant and how the affect the form of the term that you use, the spelling that you use, whether the acronyms if the preferred term or the synonym, things like that. It even gives due course to slang and neologisms, which may supersede their predecessors in the right vocabulary.

OK, so all of those details about the forms of terms was one section, but then you get a whole new section dedicated entirely to compound terms. This is because dealing consistently with compound terms is. One of the most difficult areas in the field of controlled vocabulary instruction (according to this document), so this section is aimed to help

  • Aid in achieving consistency

  • Avoid over-complexity

  • Achieve a logical structure

  • And enhance the ease and precision of the search experience
     

So this section goes over some examples where really long, almost sentence-like terms may be warranted, but then also gives some examples where you might avoid making your terms overly long, and instead combine concepts in the search experience itself (sort of begging the task of having to educate your searchers). And so this section also runs through the list of factors that would affect your decision for how complex or verbose to make a single term in your vocabulary, with factors perchance including literary warrant, vocabulary size control, disambiguation (or avoiding false positives), and of course the field or level of expertise of the audience. And then of course the librarian-favorite conundrum to mull over, word order. Should your compound term be oral surgery or surgery, oral. That kind of thing.

So, that’s such good foundational information for how to develop your vocabulary, really from a grammatical form perspective. I would say from a semantic perspective, but that would confuse the next section of the document, which talks about relationships between terms in the vocabulary and semantic linking. So, it talks about the three types of relationships in controlled vocabularies, Equivalency, Hierarchy and Association relationships.

It’s really cool. They have a whole table that lays out what each type could encompass, like equivalency includes synonyms, and lexical variants. Hierarchy includes parent/child relationships and whole/part relationships, and associative could be many, many things, like cause and effect (accident/injury), process and counter-agent (fire/ flame retardant) or Field and object (like neonatalogy / infant). And then it lays out how you might describe and develop these relationships in an index, but ideally you just have to set these up once in a term management tool and then just apply them going forward. It gives loads of examples for each possible type of relationship. Just pages and pages of examples of whole/part of derivational examples.

So that’s the majority of the document that focuses on the construction of the controlled vocabulary. The rest overviews display, interoperability and testing of the vocabulary, but I will have to go into those in my next episode.

Conclusion

Well, if that isn’t intriguing, I don’t know what is. I particularly appreciate how this single resource goes through everything from how to build the terms themselves, how to construct the lists and group the concepts, how to relate concepts to one another within the vocabulary, presentation and display, and then testing, maintenance and management. This is truly a widely-applicable resource that you will come back to again and again when building a vocabulary. The experts who put this bad boy together have done all of that work for you. And even though it is now over ten years old, the tenets within it very much hold true. Stay tuned for part two of this Ode to the Z39 in which I will overview the document’s opinions on display, interoperability and testing of the vocabulary.

Reading Recommendation

Well, this is the Ode to the Z.39, so my recommendation today is to go find this document and scroll through it to see what it is all about. Read some select sections in full to get a sense of the granular level of detail that this document goes to about various topics concerning the construction of controlled vocabularies. I do believe that after just reviewing some areas in detail, you’ll understand how valuable this resource could be to you in future development work. It takes the guesswork out of a lot of details that maybe you’ve agonized over in the past.

Sources

Please email me at inevermetadata@gmail.com and let me know what interests or confounds you the most and I will absolutely add it to my queue of topics. You can also use this email to contact me for whatever reason – corrections, questions, etc.

Rate, review and subscribe. Share with your friends on social media, and always be tagging.