Semantic Websites: Briefly Explained

Posted on

Introduction

One of my biggest special interests is the Semantic Web, especially as it relates to the web revival. Since I would like to write quite a few more blog articles on that topic, and because I don't want to duplicate the basic explanation of what it means in each of those posts, I am writing this entry as a concise explanation that I can simply link to from elsewhere.

You don't need any knowledge about how the web works to understand this post – at least that's my intention. I am keeping it as beginner-friendly as possible.

Writing Websites

(Back to Table of Contents)

Well, you could write a whole book about Semantic HTML. Many people have! Since I already struggle with my habit of writing overly verbose blog posts though, I am going to try and avoid accidentally adding to that bookshelf.

So, when we write websites, we don't just write the text that appears on screen, right? We also need to add extra information to the text so that our web browsers know what exactly they are looking at. The format we use to add that information is called HTML (HyperText Markup Language).

For example, when publishing a blog article like this one, we don't want everything to just be a big wall of plain text! We need to tell the browser somehow that a certain part of the text is supposed to be the article's title in big letters on top, that another part is a nicely formatted display of our article's content, and that perhaps yet another part might be smaller, additional information at the bottom of the page, like the author or publication date. Perhaps we want the page to feature a bunch of links elsewhere too, so we can leave the article again.

The thing is, there are two possible ways we could go about communicating this to the browser.

Presentational HTML

(Back to Table of Contents)

For one, we could add information about what exactly we want each part of our text to look like. We can select what's supposed to be our headline and tell the browser the following: this text has a font size of 20pt, it's bold, and it aligns to the centre of the screen. Our article text is justified, and the authorship information has a size of 11pt and is coloured light gray.

Writing our website like that is called Presentational HTML, because it is mostly concerned with the page's visual presentation. It gives us a lot of fine control over what the end result looks like in a web browser, which is great for designers and brand managers.

Semantic HTML

(Back to Table of Contents)

However, we could also tell the browser what kind of content a certain part of the text is. We select the sentence that's supposed to be our headline, and tell the browser: this is a headline. We also tell the browser that our blog post is in fact a kind of article, where each paragraphs begin and end, and that our author and date information are in the article's footer section. We could add that our author information is part of our contact data, and that the publishing date is in fact a certain timestamp.

If we write our page like that, it's called Semantic HTML, because we are concerned with the meaning (i. e. semantics) of our webpage's elements, rather than their style.

The HTML specification supplies us with a lot of cool conventions to mark up different types of content on our pages semantically, several of which I am explaining in the accompanying tutorial post Spice Up Your HTML Right Now (With Semantics!).

Upsides and Downsides

(Back to Table of Contents)

Both semantic and presentational HTML have their respective upsides and downsides, although I am clearly partial toward semantic HTML.

Advantages of Semantic HTML

(Back to Table of Contents)

Semantic HTML is the way that the WWW is intended to be written according to the official HTML specifications. It allows machines to understand, not just display, your website.

If you add semantic information to your blog article, a program may automatically export different versions of your page adapted to different media without any modifications to your site: a print version with a bigger headline, wider body, with pull quotes from the text, no underlined links, a mobile version and a desktop version, for example. That's because it knows which part of the article is what kind of element, and can make decisions based on that information. If a headline is merely defined by its font size and nothing else, the machine has no idea that it's a headline, and can't do anything but display it as how you told it to.

Semantic markup also allows for fancy functionality, if implemented by your web browser. Some are able to add additional features to semantic elements of your page, like bringing up your calendar if you click on a <time> element, or adding a button to the browser itself to automatically contact the site administrator without needing to scour the whole site for an e-mail address or phone number.

When I post I will be in Berlin tomorrow, a machine just sees the words "Berlin" and "tomorrow" and has no concept of when those really are. If I added location and time information to those parts, user software could theoretically do something cool with that information! Like show you tomorrow's weather in Berlin upon hovering over it.

Information can also be linked across the whole web! If I'm browsing an art page and come across a certain book title that's marked up properly behind the scenes, my browser knows that it's a book and could immediately link me to the author's website, book reviews, purchasing options or summaries, without relying on anything but the HTML in front of it.

Screen readers especially depend on semantic markup to provide a more accessible reading experience. How annoying would it be if a blind person were to read your blog posts using a screen reader, and every blog article first starts with the narrator reading out all your page navigation links and decorative images before getting to the point of the post? You don't have to imagine, because it's unfortunately reality for most websites.

Disadvantages of Semantic HTML

(Back to Table of Contents)

The big downside of semantic HTML is that you do not always get fine control over what your website ends up looking and working like. In days before stylesheets were widespread, web browsers were customisable. The user themselves could choose what colour scheme they would like the websites they visited to have, what fonts to use, what size and alignment headlines would have, how links were displayed and so on. This was generally upsetting to corporations with strong brand management, who felt like they needed exact pixel-perfect control over what their websites looked like so they could maintain their brand image and seem 'unique'.

Some people also simply don't want their websites to be machine-readable. Many commercial sites are attempting to circumvent ad-blockers by writing their websites as convoluted as possible so that the web browser can't know what is and what isn't an ad. With the anti-'AI' movement of today, public sentiment on web scraping has made a complete 180; where we once celebrated the idea of freedom of information and connecting all human knowledge via the internet to advance us all, many people now are attempting to 'protect' their websites from machines reading them because they're afraid their data will be misused.

Other than that, it's also simply more difficult to write than generic tutorial-level HTML. A lot of people writing websites, which includes kids and people who aren't really good with computers, don't even know that semantic HTML is a thing, and they don't see the upsides as worth having to study 'new' rules, especially if their current websites just work as they are.

And that's where the saddest part about all this comes into play: I have yet to come across a web browser that actually makes use of all the cool information we're adding to our websites.

The many applications of semantic HTML are unfortunately, with the exception of screen reader accessibility, still largely a theoretical ideal. The web of today is full of 'web applications' instead of web documents as it was intended. Corporations with their aforementioned brand managers and obfuscation specialists, alongside people who write terrible HTML, dominate the whole web. Without a major movement towards the semantic web, putting in the effort to write a wholly new browser that adds functionality to all the semantic elements of a website would simply not be worth it for most people.

After all, the websites most of us use on a day-to-day basis feature no good semantic HTML whatsoever. People generally don't mark up times, abbreviations, sections, addresses and so on on the web, so a browser with features that only work on a few websites and don't work on the majority of others would be somewhat awkward to use. Some operating systems, especially mobile ones, have worked on providing such functionality, but not using semantic HTML information, but merely using guesswork as to what piece of text 'looks like' a phone number or a timestamp. It's frustrating to see that we already have the means to make that information accessible but instead have to rely on our phones' guesswork.

Conclusion

(Back to Table of Contents)

Why bother writing semantic HTML? Well, I personally think it's just neat. Something about it tickles my brain in the right ways, and the ideal of adding information about my content's meaning leading to really cool technological possibilities like detailed in the Advantages section is still something I believe in.

Other than that, I might be naive, but I genuinely believe that if writing semantic HTML at least becomes a loose standard in the web revival subculture, a development project for a novelty browser focused on adding functionality to semantic HTML over compatibility with most websites could be feasible.

At least some people are (or will be) browsing the web with software that allows them to work with semantic information, and I would like to support them. After all, to me personally, there's no downside to writing 'proper' HTML as per its specifications, and if it helps facilitate cool technology and accessibility, I am excited to be a part of it.