©2019 by metashop

Episode 8 - Guest Speaker Lisa Grimm

Mindy:

Hello friends! Welcome back and thank you for joining. I am very excited today to bring you my first guest! Her name is Lisa Grimm and if you are a major metadata nerd you may have recently seen one of her famous talks making the rounds on the social media. It’s called “The Seven Circles of Metadata Hell” and you can find it pretty easily with a Google search for that title and her name. But of course I will also have a link to it on the web page for this episode on inevermetadata.com.

So, Lisa has been directing DAM, taxonomy and content programs in the US and UK since the mid-1990s, for companies, museums and archives large and small, including Women.com, Nature Publishing Group, Drexel University College of Medicine, Elsevier, GSK and many more. She most recently lead several of the taxonomy teams responsible for powering discovery across Amazon.com, including worldwide Kindle, Prime Video, Digital Music and US Books.

She can be found complaining about missing or terrible metadata on Twitter at @lisagrimm, and she occasionally writes about beer history for anyone who will pay her. She holds BA and MA degrees in archaeology and an MS-LIS, all of which have been extremely useful in her career, in both predictable and wildly unexpected ways.

Fun facts: She is a BJCP certified beer judge and occasional beer/beer history writer for Serious Eats, Philly Beer Scene and other publications.

So with that I want to welcome Lisa to the podcast and I am so excited to pick your brain about your experiences with metadata.

Lisa: 

Wonderful, thank you for having me. Very excited to speak metadata to those who love it.

Mindy:

Yes! Ok, so before I launch into my previously agreed-upon questions I have two new ones from your bio. The first is, what are some ways in which your archaeology education has proven useful in your metadata and taxonomy career?

Lisa:

That is a great question because it does seem very strange and confusing to people who are not knee-deep in archaeology - literally or figuratively. But I think that there are a couple of things that pop up - that sort of rise to the surface really quickly. Number one is, when you're working with archaeology, you're thinking about typologies. You're categorizing things into either different eras, different cultures. And some of those are a little bit squishy. You don't necessarily have things like controlled vocabularies. Sometimes you have things a little bit like them. But the way of sort of thinking about patterns in an organized way, that are non-obvious is one thing. The other thing is that it is so much about research and writing and really being able to draw conclusions - and to draw conclusions that you can support - based on limited data. So it's actually quite analogous to when you have to make decisions based on limited data. When you're trying to say "A program should go this way" or "This is how we're going to help someone try to find something."

 

There are a lot of skills there that are very transferrable. And it's one of those things when people say "Why would you have a degree in archaeology?" Well, one, it's cool, because obviously. But two: it really does give you a lot of skills that can be applied anywhere, but it really is a nice fit in this field because it really is just all about thinking about things in a structured way. So that's my pitch for archaeology. 

Mindy:

Yeah, that makes a lot of sense. My B.A. was in Classics - 

Lisa

Like the cool kids

Mindy:

Right? But with that old stuff, it makes a lot of sense having that interest in order to move forward. History! You need it!

Amazing. OK and also what does BJCP stand for?

Lisa:

BJCP Stands for Beer Judge Certification Program. And after you take a three-hour written exam. That's your first level, it's a real thing, you write it down. There's a written portion, and a tasting portion. Depending on how well you do, there are different levels of certification. You get an enamel pin, because you get an enamel pin for everything now. But, it's great. You can judge home brew competitions. Although, and I think we'll get to this later where you can say that ninety-nine percent of the time it's a metadata problem. Ninety-nine percent of the time it's a sanitation problem, when you're judging home brew. So folks, keep your sanitation good. It's important.

Mindy:
Keep your sanitation good. If you take nothing else away from this episode...

Lisa:

Very important

Mindy:

So, as you may have heard on, what? My second episode? I am a big fan of standards. I think that they are rather too ignored by companies when building proprietary taxonomies for their internal systems. So I’d love to hear some stories you might have about any industry or proprietary standards you chose, or had to conform to for your sites?

Lisa:

Yeah. Great question. And same, I love standards. When you're working in an archives or a library setting, they understand already that you're going to use things like DACS, which I know you already talked about before in a previous episode, Dublin Core. In publishing, I've been lucky where, again, they understand that kind of thing. Everything comes prepackaged with it. But I've done a lot of work too with MeSH, the Medical Subject Headings just because I have worked in scientific publishing and in Parma, so those are well-understood standards too.

But it's actually very interesting that when you take some of those standards and you look at a huge company like Amazon. They don't necessarily play by the same rules. Even though you'd like them to be influencing standards. Sometimes it happens. But the ones that do get used more frequently are ones like the BISAC codes. Again, coming from publishing. But it's another funny thing too. Because I handled both the Audible side of the fence, as well as the books and digital books side of the fence. They used those BISAC codes differently in each of those settings. Because why would it be the same? That would be crazy! 

So it's interesting that you have something that is an industry standard that you're then ingesting and using different ways to power discovery. And that can be good sometimes, you can use that as an augmentation to something. But other times it's just very confusing. But, by the same token, we were able to sometimes influence the BISAC Committee to get new codes added. One of my employees did an amazing job getting all sorts of things added too, like sub-genres for romance. So that was super cool that we were able to go to the BISAC Committee and say "We know that people are looking for this" and to take real data and say "Let's feed this back into the standard."

 

So it's great in that there are some standards in place. They're not super well understood in industry. I know, surprise to anyone who works in this field that there are standards that are then ignored. But by the same token you see a lot of difficulty with looking at films especially, where there are certainly standards out there. People are doing super cool stuff to try to actually have standardized language codes, things like that. But big companies like Amazon often ignore them and that makes it hard to find things. And so there's always that battle of going out and educating your stakeholders about how that these things are out there so let's use them. But sometimes the content pipelines even strip things out when they come in with good standard metadata. So, it's always a battle to educate people that standards exist. But sometimes you're consuming things that are there, but using them differently.

I love standards. I would love to see everyone using standards in a standard way, but, we're not there yet.

Mindy:

Do you remember offhand what BISAC stands for?

Lisa:

So BISAC Standards for Book Industry Standards and Communication. And yes, we had to look that up. Even when you work with these things day in and day out. You forget what the acronyms mean. It's the same as when you're a coder, and  you have to go to Stack Overflow. The information is there. Don't be afraid to go looking for it. 

Mindy:
Yeah, and that comes from the Book Industry Study Group. Once again, a group of industry experts who have toiled over what the standard should be. So, look into it. If you have books to apply metadata to, I bet those codes are going to be useful to you. 

Lisa:

They meet only in New York and only on New York hours and so everyone else has to phone in from the rest of the world and just deal with it. But, I respect that as a former New Yorker, and it's all good.

Mindy:

Did you have any tactics for how to keep folks who were entering data according to those standards compliant so that your data would be clean?

Lisa:

That is a great question because it can absolutely be the wild west. When you have things coming directly from publishers, that's great. Those all come in through standard formats. But again, when you're things about Amazon, so much comes in through spreadsheets. People are just entering things. There are some controlled vocabularies for some things, but not necessarily for others. 

 

But, this is where it gets really tricky. Especially when you get into self-published things. Because, there are people entering things by accident, but you also have bad actors out there. You have people putting things in places that they shouldn't be very much on purpose. So if you suddenly see that something has become the number one seller in "Limnology" and this is a real thing happened, and you say "Why, these books have nothing to do with anything being next to a river or a lake." Because obviously you're all already very au fait with limnology. And it turns out that they're all someone's self-published science fiction epic. They have figured out that that is how you game the system.

 

So that is something that you have to really watch for and to come up with creative taxonomy ways to try to avoid that and it becomes very, very complicated. And again, a lot of it would be better if you could say "This is the controlled vocabulary field and people have to choose these things." And they're are getting there with some of those systems. And you really have to build a system that is going to have, either the controlled vocabulary or some other kind of checkpoint so that you don't have either the bad actors of people just getting it wrong. Because if you don't, then you get a huge problem. And it's really difficult to do at scale, so it's one of those things, again, where you want to go out and educate people and say, we have ways to solve this, but but we need to get you on board too. And it's always a trade off between how much freedom to do you want to give people and how much do you want to say "these are the guard rails let's go in and do this."

And it's tricky too, again, because - going back to standards, you don't want to force people to go looking for a BISAC code if they don't know what that means, especially because BISAC is great for figuring out where a book goes on the shelf. It's not great for discoverability. It's not how people look for things. So you have to think about how do we display these things differently. So you might have one code on the backend, but it looks like something completely different on the front-end for ease of use. So, thinking about all those things, or how do you build systems that let people get things in, that give you flexibility, but that don't cause a lot of escalations in the middle of the night.

 

There's things that happen, and we'll chat about some of that later, but it's a tricky situation. You've got to really walk that line. 

Mindy:
Yeah, because there are some ways to handle it like when there is mutual exclusivity between concepts, you can forbid them from tagging something as both child-friendly and adult-content. But you can't really stop them from saying that their sci-fi novel is not about limnology, because they're not necessarily mutually exclusive. 

Lisa:

Yeah, and you're only going to build guardrails for so many of those things because they're non-obvious.

Mindy:

How much of your metadata is ad hoc vs. very well planned out and to purpose and then how much relies on upstream/ downstream systems that also have their own data to deal with, that they require?

Lisa:

If you work in a library or even in publishing, they totally understand that this is part of a content pipeline. You're getting these things in, they're running through the system, but then, when you look at something like Amazon, your metadata is touched by so many different teams. And in fact, it's not even owned by the taxonomy team, it is really merchandising that is doing that. So, you're working with something where you're querying it, you're trying to get attributes from it to try to understand where something goes, but you don't have that control over the metadata in the first place. So you really have to figure out how are you building your queries so that things do end up in the right places based on metadata that you can actually get. It does go upstream and downstream in ways that you can't necessarily predict, and it's really, really difficult to try to figure out what's happening there.

 

So in terms of whether it's really purposeful or ad hoc, it's really a combination, which I guess, by default really makes it kind of ad hoc, but, there are certainly things that are very standard and are very planned out, but then, especially when it gets to the descriptive metadata, that's where it can become kind of a free-for-all. Even if you try to put some, I'm going to say it again, guardrails around it. There are still things that people can do, whether by accident or on purpose, that can basically blow everything up. So it's again on of those cases where you want to advocate for making it as controlled as you can, but understanding that there are going to be things that are going to need to be a little bit more ad hoc, understanding that there are going to be things that are going to need to be a little more ad hoc. 

 

Mindy:

OK, so as we were plotting out this episode, my favorite of the topics you said you’d like to cover was: What is a metadata emergency? Can you give us an example? And what happens when there is a metadata emergency?

 

Lisa:

Absolutely. I know for anyone who may be working in a library or maybe you're working in a corporate archives and you're thinking "How could this possibly be an emergency situation?" Well, it can be! There are a couple of exciting things that can happen. First, I'll walk through the process and then I'll walk through what some of the typical emergencies are. 

Process-wise, there is an on-call rotation schedule, and I am just using Amazon as an example, there are some other e-commerce sites that have a pretty similar model. There an on-call rotation. There is a taxonomist on call 24/7 somewhere in the world, really a follow the sun model. They will get paged depending on how serious the fault is. And then it is a scramble to do a quick triage, and then figure out is this actually something that we can fix? Ninety-nine percent of the time. It is a metadata problem, which I am putting into giant air quotes. Which means that someone may or may not have the ability to actually make a change that is going to hide the thing that shouldn't be there - and, again, we'll talk about what those are in a minute - or to otherwise get it moved to where it should be. So, it takes someone who is very calm and very cool under pressure to come in and figure out very quickly what to problem is and to reassign the ticket to someone who can solve the problem, or to come up with a way to quickly build a new structure, get it out there into production so that whatever that bad thing is disappears. I think there's an assumption that if a bad thing appears on Amazon, there is a quick GUI Interface to make it go away. Ha! No, that is not the case. It can take 12 hours, 24 hours, 48 hours to get something out if it is really bad. And how quickly that happens is going to depend on how bad the thing is, and there are categories of bad. So, to walk through what some of those are.

 

Number one, any time anything inappropriate in air-quotes appears in a children's section. That is all-hands on deck. Everyone freaks out. I'll talk through one of those more specifically in a minute. But the other things that happens very, very commonly, and again this is typically people doing this on purpose. Those bad actors out there, third-party sellers, like to put adult products everywhere on the site. Home and Garden is a popular one because we haven't really, necessarily filtered those things out through some of those queries I talked about. Sex toys in lawn furniture is very common, so, if you see that from time to time, we know. We know it's there. And again, there are some things out there that may have been there for months or years, but the site is so big, that you don't necessarily know all of the bad things that are out there. There is no automated tool to find them and flag them. So, you're really waiting for members of the public to complain, which again, can really blow up if that is something that picks up steam on social media and then, that's going to push that on-call rotation, but again, it tends to be either something that is going to upset the children, we're always thinking of the children, or, something again with an adult product.

 

And there is another thing where there is a controlled vocabulary that sometimes, the way the controlled vocabulary is interpreted in different places can really be the cause of the problem. So even if you think that you've checked all of the boxes, you can still end up with a situation, so to walk through one of those situations; 

I got paged while doing a 10K run, which was super exciting. I was actually at the finish-line, so. And even though I was actually the manager on call, it was actually one of my employees who had to deal with the problem. I just had to listen in and be good moral support and say "Have you tried this?" "Have you tried this?" and it turned out what had happened was, in Germany, Sons of Anarchy was showing up in one of the children's carousels in Prime Video, so, not great for kids. And we couldn't figure out why this was happening because it should have been blocked by the ratings. A lot of things should have stopped this. And it turned out that the reason this was happening was because, again the controlled vocabulary should have stopped this, but it was being interpreted differently in Germany.

 

We had a tag that was "family-drama" in the U.S. and English-speaking countries, but in Germany it was sort of "Family-drama" or vice versa, I forget which one it is, but it was interpreted differently to mean, instead of "A drama for families" it was  "a drama about families" and so they slapped that metadata tag on it, so that meant it was showing up in a kids and family queue. So again, everyone had done all the right things, but because there was this different local understanding and when you're working globally, you really have to try to keep all of these things wrangled.

That was really causing the problem. It meant that it was getting through all of the other checks and balances and showing up there. It took a long time to try to figure out how to get it out because again because at the end of the day, it was a metadata problem, it wasn't actually a taxonomy problem. So, we had to work creatively together with stakeholders around the world. Very angry people on phone calls from 3 o'clock in the morning in Europe, early evening Seattle time. Trying to get everyone on board. Saying, "what's the problem?" and "what are we doing?" and giving status updates. There ended up being a whole internal white paper about this particular problem. So it really shows that these things can take on a life of their own and you really, really have to think about "What does this mean in every place?" And you have to keep reviewing those things to make sure that when new people come on board, that they understand what these things mean.

 

So, it can be quite stressful for people on the day, but at the same time, some of them you look back and you think "This is just funny." So you have to have a sense of humor about it and be able to roll with it, but also to apply those lessons learned, and again make sure that there is ongoing education for all of your stakeholders. Anyone who comes in contact with this metadata has to understand how it is being used at least in the realm where it affects them. And, if they don't you can say "Well, you don't want to get paged a three in the morning." So, it's a real thing that happens. Usually they're pretty funny in retrospect, but at the time, they can be pretty hardcore.

Mindy:

Very Dramatic. There's this quote from a very early episode of The Simpsons, that I always come back to when working with a taxonomy, especially when that taxonomy has maybe some algorithms or far-reaching applications that I'm always thinking of because it's where Homer is time-traveling and he thinks of Grandpa Simpson's advice on his wedding day, which is "Don't touch anything, because the slightest change will alter the future in ways you can't imagine." And I always think of that when working with taxonomy because with those far-reaching applications, you just really don't know how nuanced it can become on the fringe edges of it. 

Lisa:

Absolutely, and that's the thing. The smallest change anywhere just blows things up, especially, anytime you're working at a global scale, you don't know what small change here will make to some other part of your pipeline down the road, or in another country. There can be just huge variation.

Mindy:

Can you talk about some of the issues of working at a global scale? Like when the governance involves so many different people from different teams with competing priorities. I, as a consultant, have spent a good deal of my time at the beginning of these kinds of projects. Ideally setting my clients up for success and then watching my little chickies fly without me. So I don’t often see how it goes after I am gone, but will say, with these global-scaled projects, at the beginning, they are a mess. I mean, they are a mess.

Lisa:

Oh, that's a great point, and definitely there is always going to be something that is messy, especially when you wrap in thinking about something like video. You go to Japan, they have all these different standards, they have all these other different ways of categorizing things. Many, many, many sub-genres that we just don't have in other places, and that's great, but how to you wrap that into your broader global model? And thinking about that is really tricky, and as long as you've got that governance model in place that is working and that you review periodically, where you can say "this person is the single source of truth for this" but here are the other people who can make those decisions. So that you don't end up with bottlenecks, you don't end up with taxonomy by committee, which no one likes, it takes forever, no decisions actually get made, and I think we've all been in that situation, where, they were the best intentions, people set up a working group where either nothing happened or too many things happened. But either way, you need to figure out a way to harness all those smart people who have all of that local knowledge and to really let them drive what is happening there. 

And it's tricky to find the right model, and you need to keep revisiting the right model, but I think, that is really what has driven success at places I've worked, both huge places like Amazon, as well as in the pharmaceutical world. It's make sure that you have those people who have that in-depth local knowledge really feeding the systems, and that they're an active part of that, and that there is some sort of central control as well. Whether that is someone who is the global lead for this. But you need to make sure that those review sessions are happening, and that it is part of the daily operating model. It's not something that you do once a quarter or once a year, because then people move into new roles, people transition ,all of that knowledge is lost. 

And then, at the same time, you need enough documentation to make that happen, but you don't want this documentation paralysis where that's all that someone is doing, and then you feel like you have a ninety-page SOP to make a change, that's also no feasible. But I think, it always comes back to governance and finding what the right model is for that particular organization, that particular situation. And if you don't have that in place, you're going to be stuck with either too much or too little.

 

It always needs work, it's never perfect. That's why you have all those smart people doing all smart people things. You need those people who understand what you're trying to do, and who have that in-depth knowledge. That's why librarians are the best. And you really need to be sure that they're out there doing that work. And if they're not there, then you have people who just have no idea what they're doing. So your librarian is not just there to be the subject expert, but they're really helping to educate other people about the importance of that stuff. They're the guiding hand as some of these decisions are being made.

Mindy:
I'm starting to see a lot of companies really starting to understand that.

Lisa:
Yeah, we're getting better, it's really made a change. Even just these last five years we've seen a real uptick. Whether it's Master Data Management, or whatever, people understand that someone needs to be in control of these things and there needs to be a governance model. It's slowly picking up. 

Especially in Pharma. Especially in Publishing that's much more understood. Banking as well. Anything where there's a regulated industry. They already understand that they need something. But I think the rest of the world is catching up. Everyone realizes this isn't just part of your fun creative process. You really need to actually manage and wrangle these things, or else you're either leaving money on the table, or you're opening yourself up to lawsuits and all these other things that sometimes you have to scare people with, but they're all very real things. It's really important stuff. 

Mindy:

That wraps up our Q&A session. This has been really great.

Reading recommendation

Mindy:

Did you have a reading recommendation for our listeners?

Lisa:
I did indeed! Whenever anyone asks me "What is Taxonomy? How does that work?" I always point them to The Accidental Taxonomist. At least as a starting point. Anyone who has already gone to library school is thinking "Oh I've already done all of that stuff" I get it, but especially if you have someone that you're trying to introduce to the field, this is great. And even, frankly, if you went to library school but maybe haven't done some of this stuff recently, that's a great way to get your head around it. 

But I also have a fun new release. I'm reading "Because Internet" by Gretchen McCulloch. It's fantastic. Really thinking about how we communicate online. What these different styles are. What the different ways we think about what emojis mean, and really how you kind start to get a framework for what's out there.

And it's really funny too. Highly recommend it. Very readable. Very funny. So you've got one very practical book. And one that is very practical but it is hiding that practicality behind fun, really cool examples. And from it I got to discover that I am an old internet person. So, I'm an official category. Very excited to learn that. Yay old internet people. 

Mindy:

I'm so glad that you recommended that book because now I have to add it to my reading list which I knew I have been wanting to do. So thank you very much for those recs. 

As always you’ll find a link to it on the reading recommendations page on inevermetadata.com.

I don't have any sources, well, other than, well, Lisa here and all the learning and experience she used to bring her where she is today.