Under the hood: The technology behind ebooks


As we’ve seen, most ebooks today use the same technologies as the web. For digital publishers, a basic understanding of the two most important technologies — HTML and CSS — is essential. So let’s take a very quick look at what’s under the hood of an ebook.

Under the hood: The technology behind ebooks


Under the hoodTo design and produce ebooks – or even, as an author, just to understand what you can and can’t do with them – you need a basic understanding of what makes them tick. Without this, it’s very easy to get carried away with visions of digital nirvana and grand ideas of what your ebook will do.

As we’ve seen in the introduction, ebooks – of the EPUB and Kindle kind – are based on the same technologies that are used to build websites. The two key technologies are HTML (HyperText Markup Language) and CSS (Cascading Style Sheets). Let’s take a look at how these work.

We’ll start with a short video that will explain the basics of these two web technologies. Following it, we will look at HTML and CSS in a little more detail. Finally, we’ll look at how these two web technologies are used in ebooks.

VIDEO: Click to view this video showing the basics of HTML and CSS (3:18).

(This link will take you to an external site.)

HTML (HyperText Markup language)


HTML is called a ‘markup language’ because it uses tags enclosed in angle brackets (‘<’ and ‘>’) to mark the elements of a document such as a paragraph or a heading. The tags are usually in pairs with an opening tag denoting the start of the element and a closing tag denoting the end of the element.

When the document is passed through a special piece of reading software called a web browser(or an ebook reader), the tags tell it how to display the text, for instance as a heading on a separate line in large type. The tags themselves are not displayed.

The word ‘hypertext’ refers to a powerful feature that allows an HTML document to include a link to other places within the document, or to other documents or images elsewhere. The user can click on a link and jump to a new document which could be anywhere in the world. This linked content can also be displayed inside the document even though it resides somewhere else, so that what looks like a single web page can actually be made up of many separate pieces coming from many locations.

Let’s take a look at a simple example that shows you how HTML works. We’ll also give you a chance to try it for yourself.

Click to see an example of HTML

Let’s start with a simple document with three lines of text.
html-sample-1

html-sample-2

Our sample text with HTML tags applied.

Next, let’s add some HTML tags to mark up the elements on the page.

Here’s how it all works.

  • Each tag has an opening and a closing version, the latter simply being the same as the opening tag with the addition of a ‘/’ character.
  • The <html> tag is just there to tell the web browser that what follows includes HTML tags.
  • The <body> tag tells the browser that what follows is the main content section of the page.
  • The <h1> and <h2> tags are heading tags, in this case a main chapter heading and a sub-heading.
  • The <p> tag marks a paragraph of text.

 

Now, when this marked up document is viewed in a web browser such as Internet Explorer or Firefox, here’s what it looks like.

html-sample-display

How it looks in a web browser.

You’ll see that the tags are invisible but the browser has used them to correctly deduce the structure of the document and use this extra information to display the chapter heading larger than the sub-heading, and the paragraph text in a smaller type size.

We’ve used a web browser to display this marked-up page but we could also have viewed it in an ebook reader and got a similar result. You can think of an ebook reader as just a specialized web browser which can interpret a sub-set of HTML tags (they’re not all included in the ebook format specification) plus some special tags that are required by ebooks but are not needed as part of the general web standard.

TRY IT
  1. Open this link (opens in a new browser window or tab).
  2. On the left, you can enter text and HTML tags (enter them between the opening and closing body tags.)
  3. Try entering some text with heading 2 or heading 3 tags.
  4. Click the Submit Code button and see the result in the right hand screen.
w3c-try-it

Click to enlarge

Extending HTML: Structure and presentation

The original version of HTML specified tags for 20 elements. This was enough to get the basic job done but as the internet has advanced, so has the complexity and power of the HTML language. There are now more than 100 elements.

The other big change from the original specification was the increasing trend to separate the document’s structure from its presentation.

What this means is that HTML tags are now mainly used to describe the structure of the document (‘this is a main heading’) but not the detail of how it should be presented (‘show main headings in red 16 point Georgia font with a 1pt rule’).

There’s a good reason for this. You can display the same content in different ways — for example on different screen sizes or a screen version and a printer-friendly version — just by changing the layout instructions.

You do this using another technology of the web called Cascading Style Sheets.

Cascading Style Sheets (CSS)


A Cascading Style Sheet is another text document like an HTML document. But unlike HTML, it doesn’t contain any of the content we want to display. Instead, it only contains instructions on how to style the various HTML elements of web pages.

As we’ve seen, HTML does its work with only a handful of elements, such as Paragraph, Heading 1, and Blockquote. But CSS adds a rich formatting language to those elements, greatly extending what we can do with them.

Let’s take a look at a simple example of how it works. You’ll also have a chance to try it out for yourself.

Click to see an example of CSS

We’ll take our earlier HTML example …

html-sample-2

And create a Cascading Style Sheet for it. Our simple CSS file looks like this.

html-sample-3-css

You’ll notice a few things about it.

  • Each line starts with the name of an HTML tag (referred to as the selector). This is the tag it is going to style.
  • CSS uses ‘{‘ and ‘}’ to enclose the style instructions it’s giving to the web browser. Like HTML’s use of ‘<’ and ‘>’, the text that appears inside these isn’t displayed.
  • Inside the brackets, there’s the name of a property (eg ‘color’) followed by a value for it (‘black’).

To work its magic, the style sheet is linked to the HTML file and the result is this:

html-sample-4

TRY IT
  1. Open this link (opens in a new browser window or tab).
  2. On the left, you can enter or change the CSS.
  3. On the left, you can also enter text and HTML tags (enter them between the opening and closing body tags.)
  4. Click the Submit Code button and see the result in the right hand screen.
w3schools - css try-it

Click to enlarge

Extending CSS: Linking style sheets

Separating formatting (the CSS) from the content (the HTML) opens up many ways to do more with our content.

If an HTML page is linked to a different Cascading Style Sheet, it can display quite differently. This is a powerful feature of CSS: the appearance of a page can change completely by just changing a single line of code (the line that points to the style sheet).

And it gets better than this: in some situations, the HTML page can be set up to automatically change the style sheet for different types of e-readers.

How this relates to ebooks: A look inside an EPUB ebook


We can view an ebook reader as a sort of pared-down web browser that supports a subset of HTML and CSS tags plus a few features that are unique to ebooks. If you lift the lid on an ebook, you’ll see dozens or even hundreds of separate files. The EPUB format organises these so that they can be distributed and read in a coherent way.

There are three main parts to an EPUB ebook.

  1. The content. Here we have the words and images that make up the ebook. They will appear as a collection of files: HTML files (typically one file per chapter), image files, and CSS files.
  2. Information about how to package the content files together. Without this information, you’d just have a random collection of files. So, in addition to the content, an ebook has special files such as a list of all the files in the ebook, a table of contents list showing their order, and descriptive information about the book called metadata.
  3. A container to put everything in. With lots of separate files, you need to combine them into a single file which can be distributed without losing any of the bits. EPUB ebooks actually use the familiar ZIP file format. Instead of adding ‘.zip’ to the end of the file name, they add ‘.epub’. The result is a whole bundle of files combined into just one.

That’s probably more than you need to know if your goal is to create a simple ebook. But for the more adventurous among you, it will give you an idea of how you might go about some fine-tuning and enhancement. We’ll look at some of the tools you can use later in this chapter.

VIDEO: What’s inside an EPUB file (11:27)

Click this link to view a video that takes you through the basics of what’s inside an EPUB file.

(This link will take you to an external site.)

Source: Video2Brain.com

Taking it further


If you’re just looking for an overview so that you understand the general principles and concepts, you might have learned enough for now.

If you’re planning to be more hands-on, for instance in the production of ebooks, you’ll need to learn a bit more. There are many good resources available online. Here are a few, all free.

W3Schools (http://www.w3schools.com) You’ve used their HTML and CSS ‘Try It’ programs above. This is the go-to place on the web for tutorials and a definitive reference for all things web.

HTML Dog (http://www.htmldog.com) Similar to W3Schools but maybe a bit more personality? See which you prefer.

Don’t Fear the Internet (http://www.dontfeartheinternet.com) A series of 7 slightly offbeat video tutorials. If you’re not ready to soak up all the details yet, this might be for you. Note that their sequence is back-to-front. The first one in the series is at the bottom of the list and the last at the top.

Channel 9 (http://channel9.msdn.com/Series/HTML5-CSS3-Fundamentals-Development-for-Absolute-Beginners)  Software behemoth Microsoft has assembled a great resource, aimed at upskilling its developer community. This video series — HTML5 & CSS3 Fundamentals: Development for Absolute Beginners — is very thorough (and quite long), but you’ll be pretty smart by the end of it. Ready to tackle any ebook code.

And for the technically-inclined, here’s a video (35:59) that shows how to code an EPUB ebook.
http://www.youtube.com/watch?v=0dz4RlwFlUc

Resources


Find out more about this topic on our Digital Publishing 101 useful resources site.

 

Feedback Icon Feedback or suggestions for this page
(Visited 3,372 times, 3 visits today)