Episode 7 - Semantics Part 3
Welcome to the podcast. Between recording my last episode and this one I went on a little vacation out of the country and it feels like I have not been back in the studio in ages. But I hear I may have figured out the volume issues on my intro music so hey, there’s that.
As promised previously, this week we’re going to take a little dive deeper into semantic schemas, where to find them, how to use them, and the ways that you can see semantic data represented on the web. I am not going to talk about tools. There are many out there that help with the management of taxonomies and do some leg work to make it semantic, but there is a lot of nuance to it, and I don’t want to cross over the boundary of endorsements… At least not going to mess with that today.
RDF, Triples, SARQL and SKOS
So I am going to start by talking about the family for languages and frameworks that support the use of ontologies on the web. And then I’ll talk about microdata and some of the way that people are embedding it into web pages. And right off the bat, in talking about RDF, SKOS and SPARQL, I think you’ll start to see how the semantic web has had a slow time of it. I myself, despite being an keenly interested in metadata and the semantic web, have always been intimidated by these massive primers that each one of these specifications has. I’m saving OWL for another time. While ontologies are a critical element of the semantic web, you can use RDF or Microdata to create semantic content, you don’t need to build your own semantic ontologies to create semantic meaning for web content. OWL is just a whole another animal, wordplay intended.
I already told you a bit about RDF, the resource description framework. It is a standard model for data interchange on the web, meaning it creates a standard way for various websites to talk to bits of information and create meaning for them. RDF extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link (this is usually referred to as a “triple”). So what this means is the IRL links, let’s say the phrase “roast chicken” on a website (harkening back to our roast chicken recipe example from last episode) to what’s called “microdata” on another site that defines what roast chicken is. My use of the word “defines” here is pretty loosey goosey. It is placed within a schema there and that schema provides a lot of context clues and the computer uses those context clues and such to provide YOU with a visual representation of where roast chicken fits into the world that, well, is really helpful and useful.
So I mentioned the word “triples.” You may have heard of these before, or the word “triplestore.” A triple is a “subject-predicate-object” semantic sentence grouping. And it is stored in a database called a “triplestore” that correctly stores each piece of data in its place, either as the subject, the predicate or the object. And so these links, create semantic understanding for the computer about what each detail is. So for instance “Roast Chicken” might have a triple of something to the effect of “Roast Chicken” – Has Type – “Finished Meal” Whereas “Raw Chicken” might be – Has Type – “Ingredient” and “Cook for 55 minutes” might be – Has type – “Instructions” and so on. And so the computer will be able to “understand” and display this stuff to you more clearly. It would never know all of this without these semantic wrappers.
Now to get to semantics with RDF, you use RDFa, or Resource Description Framework in Attributes, which is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata in your documents. The RDF data-model mapping enables use of RDFa for embedding RDF subject-predicate-object expressions within HTML documents. It also enables the extraction of RDF model triples by compliant er web browsers.
SPARQL is the query language for RDF, that is, a semantic query language for databases, it is able to retrieve and manipulate data stored in Resource Description Framework (RDF) format in triplestores. SPARQL stands for SPARQL Protocol and RDF Query Language… So yes the acronym is recursive. The S in SPARQL stands for SPARQL. They just really didn’t want to miss out on the chance for a truly fabulous ancronym – because otherwise it would have just been PARQL. It can be used to express queries across diverse data sources. It is a key technology for the sematic web, being so perfectly designed to work with RDF. SPARQL allows for a query to consist of triple patterns, conjunctions (the AND in a Boolean search), disjunctions (the OR in a Boolean search), and optional patterns.
SKOS meanwhile, is the Simple Knowledge Organization System. SKOS is described by the W3C as an area of work developing specifications and standards to support the use of knowledge organization systems (KOS) such as thesauri, classification schemes, subject heading systems and taxonomies within the framework of the Semantic Web. It represents vocabularies and schemas for RDF. It presents sets of classes and properties with which to describe bits of data, providing a standard way to represent knowledge organization systems using the Resource Description Framework (RDF).
Using RDF also allows knowledge organization systems to be used in distributed, decentralised metadata applications. Decentralised metadata is becoming a typical scenario, where service providers want to add value to metadata harvested from multiple sources.
Microdata, Schema.org and JSON
Microdata is an alternative to using RDF. It works within HTML to provide similar semantic context as RDF does. Microdata is a WHATWG HTML specification used to nest metadata within existing content on web pages. Search engines, web crawlers, and browsers can extract and process Microdata from a web page and use it to provide a richer browsing experience for users. This is like the search result that I told you about in the last episode. Search engines benefit greatly from direct access to this structured data because it allows them to understand the information on web pages and provide more relevant results to users. Microdata uses a supporting vocabulary to describe an item and name-value pairs to assign values to its properties. It is an attempt to provide a simpler way of annotating HTML elements with machine-readable tags than the other approaches of using RDFa and microformats.
In 2013, because the W3C HTML Working Group failed to find someone to serve as an editor for the Microdata HTML specification, its development was terminated with a 'Note'. However, since that time, two new editors were selected, and five newer versions of the working draft have been published, most recent being W3C Working Draft 26 April 2018.
I’ve talked about Schema.org before. Schema.org uses Microdata to connect information to web pages to create that fanciful search experience that we have all come to expect. Seriously though, how many remember when that sea of links was the innovative way to provide information to users. It blew us away and blew the competition out of the water, and yet here we are today, miles beyond that.
You use the schema.org vocabulary along with the Microdata, RDFa, or JSON-LD formats to add information to your Web content. Although schema.org focuses on Microdata, most examples on the schema.org site show examples in RDFa and JSON-LD too.
JSON is built on two structures:
A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.
An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.
These are universal data structures. Virtually all modern programming languages support them in one form or another. It makes sense that a data format that is interchangeable with programming languages also be based on these structures.
So that is a wrap for the semantics series. I’d love to hear your thoughts or what you do with semantic metadata.
For now, my reading recommendation for this week is titled “Semantics in Business Systems: The Savvy Manager’s Guide" and I chose this because it is for the business. One of the few negative reviews of this book online said that it was too basic, it is only for the uninitiated, and frankly, I don’t care how advanced you are, if you are trying to explain to your business team why they should be looking at the use of semantic metadata, then you need to be able to speak in terms that the uninitiated can understand. So if you’re anything like me, and really bad about remembering what your audience knows or doesn’t know, then this will probably be a good read!
You can find my sources online. I used some Wikipedia articles to provide slightly more clear explanations of the concepts, but also extensively used W3C (or World Wide Web Consortium) primers to describe their specifications. I used some verbiage from JSON.org as well as Q&A from Stack Exchange.
Thank you all for listening and remember, to apply metadata is to believe in tomorrow.
On SPARQL: https://en.wikipedia.org/wiki/SPARQL
W3C (Or, the World Wide Web Consortium)
Look at this comparison: