Under the hood: The technology behind ebooks
To design and produce ebooks – or even, as an author, just to understand what you can and can’t do with them – you need a basic understanding of what makes them tick. Without this, it’s very easy to get carried away with visions of digital nirvana and grand ideas of what your ebook will do.
As we’ve seen in the introduction, ebooks – of the EPUB and Kindle kind – are based on the same technologies that are used to build websites. The two key technologies are HTML (HyperText Markup Language) and CSS (Cascading Style Sheets). Let’s take a look at how these work.
We’ll start with a short video that will explain the basics of these two web technologies. Following it, we will look at HTML and CSS in a little more detail. Finally, we’ll look at how these two web technologies are used in ebooks.
VIDEO: Click to view this video showing the basics of HTML and CSS (3:18).
(This link will take you to an external site.)
HTML (HyperText Markup language)
HTML is called a ‘markup language’ because it uses tags enclosed in angle brackets (‘<’ and ‘>’) to mark the elements of a document such as a paragraph or a heading. The tags are usually in pairs with an opening tag denoting the start of the element and a closing tag denoting the end of the element.
When the document is passed through a special piece of reading software called a web browser(or an ebook reader), the tags tell it how to display the text, for instance as a heading on a separate line in large type. The tags themselves are not displayed.
The word ‘hypertext’ refers to a powerful feature that allows an HTML document to include a link to other places within the document, or to other documents or images elsewhere. The user can click on a link and jump to a new document which could be anywhere in the world. This linked content can also be displayed inside the document even though it resides somewhere else, so that what looks like a single web page can actually be made up of many separate pieces coming from many locations.
Let’s take a look at a simple example that shows you how HTML works. We’ll also give you a chance to try it for yourself.
Extending HTML: Structure and presentation
The original version of HTML specified tags for 20 elements. This was enough to get the basic job done but as the internet has advanced, so has the complexity and power of the HTML language. There are now more than 100 elements.
The other big change from the original specification was the increasing trend to separate the document’s structure from its presentation.
What this means is that HTML tags are now mainly used to describe the structure of the document (‘this is a main heading’) but not the detail of how it should be presented (‘show main headings in red 16 point Georgia font with a 1pt rule’).
There’s a good reason for this. You can display the same content in different ways — for example on different screen sizes or a screen version and a printer-friendly version — just by changing the layout instructions.
You do this using another technology of the web called Cascading Style Sheets.
Cascading Style Sheets (CSS)
A Cascading Style Sheet is another text document like an HTML document. But unlike HTML, it doesn’t contain any of the content we want to display. Instead, it only contains instructions on how to style the various HTML elements of web pages.
As we’ve seen, HTML does its work with only a handful of elements, such as Paragraph, Heading 1, and Blockquote. But CSS adds a rich formatting language to those elements, greatly extending what we can do with them.
Let’s take a look at a simple example of how it works. You’ll also have a chance to try it out for yourself.
Extending CSS: Linking style sheets
Separating formatting (the CSS) from the content (the HTML) opens up many ways to do more with our content.
If an HTML page is linked to a different Cascading Style Sheet, it can display quite differently. This is a powerful feature of CSS: the appearance of a page can change completely by just changing a single line of code (the line that points to the style sheet).
And it gets better than this: in some situations, the HTML page can be set up to automatically change the style sheet for different types of e-readers.
How this relates to ebooks: A look inside an EPUB ebook
We can view an ebook reader as a sort of pared-down web browser that supports a subset of HTML and CSS tags plus a few features that are unique to ebooks. If you lift the lid on an ebook, you’ll see dozens or even hundreds of separate files. The EPUB format organises these so that they can be distributed and read in a coherent way.
There are three main parts to an EPUB ebook.
- The content. Here we have the words and images that make up the ebook. They will appear as a collection of files: HTML files (typically one file per chapter), image files, and CSS files.
- Information about how to package the content files together. Without this information, you’d just have a random collection of files. So, in addition to the content, an ebook has special files such as a list of all the files in the ebook, a table of contents list showing their order, and descriptive information about the book called metadata.
- A container to put everything in. With lots of separate files, you need to combine them into a single file which can be distributed without losing any of the bits. EPUB ebooks actually use the familiar ZIP file format. Instead of adding ‘.zip’ to the end of the file name, they add ‘.epub’. The result is a whole bundle of files combined into just one.
That’s probably more than you need to know if your goal is to create a simple ebook. But for the more adventurous among you, it will give you an idea of how you might go about some fine-tuning and enhancement. We’ll look at some of the tools you can use later in this chapter.
VIDEO: What’s inside an EPUB file (11:27)
Click this link to view a video that takes you through the basics of what’s inside an EPUB file.
(This link will take you to an external site.)
Taking it further
If you’re just looking for an overview so that you understand the general principles and concepts, you might have learned enough for now.
If you’re planning to be more hands-on, for instance in the production of ebooks, you’ll need to learn a bit more. There are many good resources available online. Here are a few, all free.
W3Schools (http://www.w3schools.com) You’ve used their HTML and CSS ‘Try It’ programs above. This is the go-to place on the web for tutorials and a definitive reference for all things web.
HTML Dog (http://www.htmldog.com) Similar to W3Schools but maybe a bit more personality? See which you prefer.
Don’t Fear the Internet (http://www.dontfeartheinternet.com) A series of 7 slightly offbeat video tutorials. If you’re not ready to soak up all the details yet, this might be for you. Note that their sequence is back-to-front. The first one in the series is at the bottom of the list and the last at the top.
Channel 9 (http://channel9.msdn.com/Series/HTML5-CSS3-Fundamentals-Development-for-Absolute-Beginners) Software behemoth Microsoft has assembled a great resource, aimed at upskilling its developer community. This video series — HTML5 & CSS3 Fundamentals: Development for Absolute Beginners — is very thorough (and quite long), but you’ll be pretty smart by the end of it. Ready to tackle any ebook code.
And for the technically-inclined, here’s a video (35:59) that shows how to code an EPUB ebook.
Find out more about this topic on our Digital Publishing 101 useful resources site.