Episode 10 - Ode to the Z39.19 Part 2
So I’m back today still ode-ing to the Z39 Guidelines for the Construction, Format and Management of Monolingual Controlled Vocabularies because it is a very important, and very BIG document. When I worked at UNICEF I actually printed out this giant monster, with MUCH ambivalence toward myself for still being so attached to paper, so that I could review, color code, place mark and dog ear it to my heart’s content. That document remains the first thing that comes to mind when I meet someone who is wondering if they are interested in metadata. If this content excites you, then you are a librarian at heart.
OK so we’re picking up today at section 9 of the document which is all about displaying controlled vocabularies. And there is a very important point to be made here about the considerations around the vocabulary on the back end and how it may be written for a computer, or how it may look as an element of information architecture, and how it may look in navigation and search. A vocabulary term, as used as metadata in an information system, may take many forms for its various uses. And as the Z39 notes, “the way in which a controlled vocabulary is present affects the user’s willingness and ability to make use of it.” This harkens back to some of the previously mentioned decision points that take into account the audience type, whether a general audience, or a specialist audience, or whathaveyou.
But a nice thing that this section lays out is that the “audience” of the controlled vocabulary is not necessarily just the end-users. A lot of people have reason to care about the shape, size, display and format of the vocabulary. For one, the vocabulary’s maintenance personnel. The experts in indexing and vocabulary construction who are also often exerts in the domain of knowledge that the vocabulary models. This group is going to need to see a lot of different information about the vocabulary that end-users won’t need to see, for instance history and scope notes.
The indexers and search experts are the ones with expertise in indexing, online information retrieval and/ or use of controlled vocabularies. This audience requires sophisticated terminology displays with access to cross-references, definitions, levels of the hierarchy and notes for terms.
The end-user audience is, of course, also to be considered. They may be experts in the domain that the vocabulary models, assuming the domain requires any expertise. It may be general subject. But they are almost certainly not experts in the jargon or the complexities of information retrieval. These folks will bene fit from seeing the hierarchy of the vocabulary, but also being able to see it broken down into – what we might familiarly understand as the advanced search – the various elements of the vocabulary’s component parts. Depending on the complexity of the vocabulary, they may benefit from seeing the relationships between terms, and will almost certainly make use of “see alsos” and synonyms.
So then it goes on to illustrate some examples of how you might present the vocabulary and it’s relations and hierarchy to your audience. It opens by explaining that USE references from non-preferred terms should be incorporated into the main listing of the vocabulary rather than being relegated to an auxiliary, and explains the relationship between USE and USE FOR, and the scope of USE in permuted displays, inverted forms and hierarchical relationships.
After that the section goes on to explain the display of hierarchical and associative relationships. That is, How should the narrower term look in relation to the broader term? We’re all pretty familiar with how this look, with the narrower terms being indented and below the broader term. And then it talks about typography, and how to use typographical differences to represent preferred/ non-preferred status, if a word is a vocabulary term vs. a relationship indicator (for example using all caps on USE in a USE relationship).
And of course, how do you sort? One of my favorite memories from my internship at the Watson library at The Met was, unsure how to organize a dash vs. a space in two otherwise very similar titles, my advisor advised: Something comes before nothing. It seemed to profound that I repeated it to all my library school friends and to this day it strikes me as so philosophical. And so that is the same philosophy covered in this section about filing and sorting and it helps you understand how to sort word-by-word vs. letter-by-letter and what to do with numerals, commas, parentheses, and more.
Section 9.3 talks about the options for displays of vocabularies. Now of course there are considerations here as well about the audience, but a big consideration is space. So electronic presentations of vocabularies offer a lot more flexibility and sort of behind the curtain smarts in getting that vocabulary to help your end users. In fact, in electronic vocabularies, you often don’t even need to grapple with how to present a synonym because you just don’t unless you’re showing via a type-ahead that it directs to the preferred term.
But still there are options for how to display the vocabulary. There is alphabetical display. This is a classically successful display because if the end-user knows what topic they are looking for, they can just scroll to its place in the alphabet, rather than have to look through a potentially large list of top-level topic nodes.
A Permuted Display lists each term multiple times in an alphabetic sequence for each of the words in the term. So for a term phrase “Very High Frequency Radiation” which has four words in it, it would show up four times across the entire vocabulary, once under F (starting with frequency), once under H (starting with high), once under R (starting with radiation) and once under V (starting with very) so that no matter where in the alphabet someone starts from, they can find the whole term.
So then it goes on to talk about hierarchical displays, giving advice on how to display multi-level hierarchies, generic structure hierarchies, tree structures, top term structure displays, *take a breath* two-way hierarchical structure displays, broad category displays, Faceted display, and graphic display. I would go into detail about all of these, but then this would end up being a three-part series!
The last piece of section 9 talks about display formats, with detail and advice about the special considerations of print formats for vocabulary display – how to minimize double-lookups, juxtaposing terms in USE references, etc. and for Screen formats including user interface design, keywords searching, term detail display, pick lists (AKA the iconic drop down list), and then lastly the considerations for web formats for displaying controlled vocabularies. The include how to display a path hierarchy, web navigation techniques, browsing, and navigation via hyperlink.
Section 10 talks about interoperability and its importance. Frankly, in the digital age, it’s why we’re all here. It is why librarianship is having a resurgence in validity… We’ve been holding the secrets of information organization this whole time, and it is the key to unlocking that ever-enigmatic interoperability. So it gives a great breakdown of the factors that affect interoperability including the similarity of the content subject matter in different domains, the different controlled vocabularies used to index content from similar domains, the degree of specificity or granularity of the controlled vocabularies used to index different content domains or databases, the differences in how synonyms and near-synonyms are handled, the search methodologies expected by the database being searched, the literary, organizational and user warrants used in developing the vocabulary and finally, the intended purpose of the databases or systems.
It has a short section covering multi-lingual vocabularies, but the document is specifically titled “for Monolingual Controlled Vocabularies” so it prefaces this short section with a direction to the ISO Guidelines for the establishment and development of multilingual thesauri and then doesn’t say a whole lot here.
But then there is a section about searching, and how precision and recall are affected by searching across domains with multiple vocabularies in use. So the idea of crosswalking is not new in library science and this too is a fairly small section that just talks briefly about how to deal with this issues that arise from vocabularies built in vacuums. Then the following small sections talk about merging databases, merging vocabularies themselves, and then a section that actually dives a bit into some functional advice is 10.9, which talks about the Storage and Maintenance of Relationships among Terms in Multiple Controlled Vocabularies. Which sounds like a nightmare to me, but it must be required if you’re making use of multiple standards. One provided suggestion for clustering terms from various vocabularies is creating a semantic network. And another suggestion for associating terms from multiple vocabularies is a lexical database. Again, not a huge amount of information is given here, but it gives some good graphical examples to help wrap your head around it.
The Z39 closes with a section about Construction, Testing, Maintenance and Management Systems. I love it be
cause it is clear and excellent advice on how to proceed, but in re-reviewing it at this point in my career, it is a bit of preaching to the choir for any experienced librarian with some details on the justification for constructing a controlled vocabulary, a warning to avoid duplicating existing vocabularies, a bit of common sense advice to decide how you are going to proceed before you start to do so and then a great bit of detail on approaches to the construction, whether by committee (top up or bottom down), an empirical approach (not empirical as in lording all decision like you are an empire, but more empirical as in based on evidence, but MAN would I have enjoyed it if they had offered that approach and named it as such) – with the deductive and inductive methods laid out (deductive being that new terms are actively extracted from content objects but no vocabulary is built until a suffifient number of terms has been collected and the inductive being that new terms are selected for potential inclusion as they are encountered in content objects with vocabulary control applied at the outset and just growing from there.).
It covers the use of machines to assist in extracting terms, a very common approach nowadays. A machine can help extract candidate terms from large swaths of content, tell you how frequently the term arises, tell you how often users are searching for the term. They’re pretty useful from that perspective.
Section 11 also talks about term records. These are so important. You can’t just have a term and put it in a hierarchy and clap each other on the back and walk away. You need to define the term, the scope that it covers, you’ll need to trace the history of changes to the term, know its synonyms, relationships, etc. and more cetera. And then there’s the entire process of approving a term. Once the vocabulary is “complete” it is really just at that time born. It is not done. Because new terms will always be arising, so you’ll need a process and a set of minds to validate candidate terms before that are admitted into the vocabulary, they have to make many considerations around term specificity, and what to do with those awful, awful one-off concepts that they simply, cannot. Place. In. ANY area, with “miscellanoues” and “other” being blasphemy in the world of library science.
Now if you don’t read any other section of this document (but seriously, why would you skip out on any of it), you should check out 11.2 on Testing and Evaluating the controlled vocabulary. I am personally a big fan of idea that your end users are going to make or break the success of your project, because they are going to choose to adopt or not adopt your excellent new search tool based on how well it fits their needs (think you’ve cornered them and they HAVE to use it? They’ll find a workaround, I promise). So, test it with them! This section talks about testing methods and evaluation criteria for making the best possible controlled vocabulary that you can make!
So that is the end of my Ode to the Z39. If this is the first you’ve heard of this bad boy, I urge you to download it – for free – online and take a look. It offers so much that is so relevant to the practice of designing controlled vocabularies. They’ve thought about so many minute details so that you don’t have to stumble through figuring it out. This is the classic librarian’s standard of practice approach to figuring out the tiniest details of best practice. Have a look. Tell me your favorite aspects of the document. I’d love to hear the ways it has resonated through your career.
My reading recommendation this week I found when searching for books on controlled vocabularies and I thought this HAD to be the recommendation for this week. After heavy, dry, monster of a doc-u-ment that is the Z39, I loved the lightness of this non-MLIS book with a very artsy/MLIS title. It’s called Seven Controlled Vocabularies and Obituary 2004. The Joy of Cooking: [AIRPORT NOVEL MUSICAL POEM PAINTING FILM PHOTO HALLUCINATION LANDSCAPE]. And yeah, it is every bit the kind of head trip that the title suggests. It is not a book about controlled vocabularies, instead, it is an artist’s attempt to answer the question: “How do we read a book as an object in a network, in a post-book, post-reading, meta-data environment?” Tell me this isn’t the best way to decompress from the Z39? From all the excitement of the Z39, that is.
Just the Z.39 again. You can find a link to the document on my website on the page for this episode.
Please email me at firstname.lastname@example.org and let me know what interests or confounds you the most and I will absolutely add it to my queue of topics. You can also use this email to contact me for whatever reason – corrections, questions, etc.
Rate, review and subscribe. Share with your friends on social media, and always be tagging.