Ebook conversion from print editions

The surge in digitization of publisher backlists has spawned an industry, much of it based in India, which specializes in converting printed books into ebooks. In this section, you will learn what to supply to a conversion service, how they convert an existing print edition, the outputs, typical costs, and things to consider when working with outsourced conversion services.


We’ve seen how to prepare a manuscript for digital conversion. But in most cases today, publishers produce ebooks from print editions, or in conjunction with their print edition workflow.

The three most common ways to convert print editions to the ebooks are from:

  • Print-ready PDF
  • Hard-copy book
  • Original files from the page layout program

The print-to-digital conversion process

Let’s take a look at the conversion process.

Ebook conversion workflow

1. Source documents

As we’ve seen, the process starts with scanned book pages, a print-ready PDF, or the original page layout files.

Find out more about how source files are produced

2. Extract

Next, the text, images and other information are extracted from the source file for further processing.

Conversion services invest a lot of their technical resources into making this part of the process more accurate. For high volume customers, they will fine-tune their extraction tools to match publishers’ content types and house styles.

3. Mark up

Before the extracted content can be transformed into an ebook, the raw text must be marked up using a special markup language to tag the structure (and sometimes the meaning) of  the content.

Sample showing HTML code

An example of HTML mark up (Click to enlarge)

Markup languages are used to allow computer processing of documents. Mark up is necessary because computer programs cannot recognize the elements of a page from their appearance.

The programs read the markup tags to identify which elements are chapter headings, captions, paragraphs, etc.

The two most commonly used markup languages in ebook production are:

  • HTML (HyperText Markup Language)
  • XML (eXtensible Markup Language)
Read more

XML or HTML?  XML tends to be favored as the markup language of choice for large-scale publishing and conversion projects, especially complex works such as textbooks.

However, HTML has advocates who point to advantages that include its relative simplicity and its much wider usage. A strict form of HTML can be extended to provide similar richness to XML’s markup by the use of  ‘add-ons’ to tags called class attributes.

4. Transform

Once a file is marked up, a program can be applied to it (referred to as a transformation engine) to output it in new formats. If sound decisions are made at the markup stage, this input document can be used to produce a range of outputs, now and in the future as technology and formats change. Here’s a graphical representation of the transformation process:XML transformation

5. Output

As well as EPUB and Kindle, another common output is PDF, optimized for ebook, Print-on-Demand or for web viewing.

Many conversion services are geared up to the needs of publishers with large backlists, or who regularly convert print editions to ebooks. These publishers will gain cost and efficiency benefits from investing in standards, such as a standardized markup system, and reuse of components. Examples are templates, style sheets and common graphic elements to support house styles.


For small volumes and one-off projects, you’ll probably pay US$0.50 to $1.00 per page or more plus one-off charges for production of a single book. Scanning from hard copy, and a high proportion of complex pages will boost this cost.

For higher volume users, there is a wide range of prices and significant differences based on the types of books being converted but expect to pay half or less of the low volume rates.

Ebook conversion service providers

Here are a few examples of the many service providers who will convert print books to ebooks from print or electronic sources. Each of these companies services international publishers.

Ebook conversion service providers for medium to large publishers

This group will suit higher volume publishers and are used to dealing with high volumes and integrating their processes with the production workflows of their publisher clients.

Several of them, like Aptara, Datamatics, DCL and Innodata have a long history in XML document conversion and serve markets outside of publishing, such as medical, legal, technical and scientific.

Others such as Infogrid Pacific and iPublishCentral have built their organisations specifically on digitizing books. All of these companies provide services to the textbook, professional and STM (Scientific, Technical and Medical) sectors as well as trade publishers.

Click here to view some ebook conversion service providers for medium to large publishers

Ebook conversion service providers for small publishers

Several service providers will work with small and self-publisher clients on a single title or low volume production. Several of the service providers at this end of the market tie their conversion services their own distribution service, and work only from manuscripts.

The group we’ve listed here will convert from a range of print-related formats and will provide publishers with the digitized ebook files for independent distribution.

Ebook conversion service providers for small publishers


This video is fairly information-dense but it’s worth a look if you’re interested in understanding some of the nitty-gritty of ebook conversion from an experienced practitioner. It’s a recording of a webinar presented by conversion service Data Conversion Laboratory. Note that it was produced to sell the value of using a professional outsourcer rather than an automated online service so there’s a bit of soft-sell but it’s high-value content.

You’ll need to put aside 20 minutes to watch this video (20 minutes more if you want to listen to the questions from the audience) and we’ve set the video to start about 12 minutes in. You can download the slides from this presentation here.


Find out more about this topic on our Digital Publishing 101 useful resources site.


Feedback Icon Feedback or suggestions for this page
(Visited 241 times, 4 visits today)