Ebook conversion from print editions

The surge in digitization of publisher backlists has spawned an industry, much of it based in India, which specializes in converting printed books into ebooks. In this section, you will learn what to supply to a conversion service, how they convert an existing print edition, the outputs, typical costs, and things to consider when working with outsourced conversion services.


We’ve seen how to prepare a manuscript for digital conversion. But in most cases today, publishers produce ebooks from print editions, or in conjunction with their print edition workflow.

The three most common ways to convert print editions to the ebooks are from:

  • Print-ready PDF
  • Hard-copy book
  • Original files from the page layout program

The print-to-digital conversion process

Let’s take a look at the conversion process.

Ebook conversion workflow

1. Source documents

As we’ve seen, the process starts with scanned book pages, a print-ready PDF, or the original page layout files.

Find out more about how source files are produced

Print-ready PDF

print-ready PDF is produced from the page-layout software, typically Adobe InDesign, in a form that the printer can use to set up the printing press.

There might also be a (similar) version set up to work with a digital print-on-demand (POD) service.

If you can obtain this file (a challenge for many older books), it’s the best format for an outsourced conversion service.

Conversion services have developed sophisticated programs and processes to extract the text and layout information to produce a clean, marked up source file for ebook production.

Hard-copy book

For hard-copy books, the process has a couple of extra steps. It starts with a page-by-page scan. This creates a large image file where the text of each scanned page is readable but it’s not in a format that can be edited.

To create editable text, a process called optical character recognition (OCR) is used to convert the images of text into editable text. This can be non-destructive, that is the original book is not taken apart to scan, or destructive scanning which is cheaper.

OCR is an imperfect process, so a human proofreader corrects errors and applies mark up tags to mark up the document structure. The result is a clean, marked up source file which is used for conversion.

Original layout files

Some services will accept the original files produced by page layout programs, most commonly Adobe InDesignQuarkXpress or Adobe FrameMaker. You’ll need to consult with the service provider as to whether they accept these files, and under what restrictions. There might be problems using them if they are from very old versions, or have missing fonts or images.

If the book has never been published in print form, or has been subject to extensive revisions for a digital-only edition, the source format is likely to be Microsoft Word which we’ve covered in the earlier section on preparation.

2. Extract

Next, the text, images and other information are extracted from the source file for further processing.

Conversion services invest a lot of their technical resources into making this part of the process more accurate. For high volume customers, they will fine-tune their extraction tools to match publishers’ content types and house styles.

3. Mark up

Before the extracted content can be transformed into an ebook, the raw text must be marked up using a special markup language to tag the structure (and sometimes the meaning) of  the content.

Sample showing HTML code

An example of HTML mark up (Click to enlarge)

Markup languages are used to allow computer processing of documents. Mark up is necessary because computer programs cannot recognize the elements of a page from their appearance.

The programs read the markup tags to identify which elements are chapter headings, captions, paragraphs, etc.

The two most commonly used markup languages in ebook production are:

  • HTML (HyperText Markup Language)
  • XML (eXtensible Markup Language)
Read more

HTML (HyperText Markup Language) is the language of the web and the underlying language of Kindle and EPUB ebooks.

The latest version of HTML has about 100 tags which describe various elements such as heading, title, article, aside, and paragraph. A tag name is enclosed between angle brackets to distinguish it from the rest of the text.

XML (eXtensible Markup Language) is a richer, more powerful (but also more complex) markup language.

XML uses tags like HTML’s to mark up documents. In fact, the two languages are closely related: they share a common parent and the latest HTML versions are a subset of XML.

But unlike HTML, which must describe every piece of content using its fixed set of 100 or so tags, XML is extensible. This means that you can add as many tags as you like, including tags that describe the meaning of the content, not just its structure. This is referred to as semantic markup.

So XML can be used to describe content in great detail and specific to a particular industry or application, such as the subject matter or reading level.

XML Schemas. While it’s possible to create a completely customized system of XML mark up — essentially a custom vocabulary to describe your works  — in practice it is more common to adopt existing markup systems (called schemas) which have widespread support within an industry. A commonly-used schema for books is called DocBook which provides an extensive markup ‘vocabulary’ specific to books.

XML or HTML?  XML tends to be favored as the markup language of choice for large-scale publishing and conversion projects, especially complex works such as textbooks.

However, HTML has advocates who point to advantages that include its relative simplicity and its much wider usage. A strict form of HTML can be extended to provide similar richness to XML’s markup by the use of  ‘add-ons’ to tags called class attributes.

4. Transform

Once a file is marked up, a program can be applied to it (referred to as a transformation engine) to output it in new formats. If sound decisions are made at the markup stage, this input document can be used to produce a range of outputs, now and in the future as technology and formats change. Here’s a graphical representation of the transformation process:XML transformation

5. Output

As well as EPUB and Kindle, another common output is PDF, optimized for ebook, Print-on-Demand or for web viewing.

Many conversion services are geared up to the needs of publishers with large backlists, or who regularly convert print editions to ebooks. These publishers will gain cost and efficiency benefits from investing in standards, such as a standardized markup system, and reuse of components. Examples are templates, style sheets and common graphic elements to support house styles.


For small volumes and one-off projects, you’ll probably pay US$0.50 to $1.00 per page or more plus one-off charges for production of a single book. Scanning from hard copy, and a high proportion of complex pages will boost this cost.

For higher volume users, there is a wide range of prices and significant differences based on the types of books being converted but expect to pay half or less of the low volume rates.

Ebook conversion service providers

Here are a few examples of the many service providers who will convert print books to ebooks from print or electronic sources. Each of these companies services international publishers.

Ebook conversion service providers for medium to large publishers

This group will suit higher volume publishers and are used to dealing with high volumes and integrating their processes with the production workflows of their publisher clients.

Several of them, like Aptara, Datamatics, DCL and Innodata have a long history in XML document conversion and serve markets outside of publishing, such as medical, legal, technical and scientific.

Others such as Infogrid Pacific and iPublishCentral have built their organisations specifically on digitizing books. All of these companies provide services to the textbook, professional and STM (Scientific, Technical and Medical) sectors as well as trade publishers.

Click here to view some ebook conversion service providers for medium to large publishers

Aptara. (http://www.aptaracorp.com/key-markets/digital-publishing/)  US-headquartered with 20 years experience, 5000 employees.

CodeMantra. (http://www.codemantra.com/AssetConversionServices.htm) US-headquartered, provides production, distribution and digital asset management solutions.

Datamatics. (http://www.datamatics.com/services/publishing-services/ebook-content-transformation) India-headquartered with offices worldwide, one of India’s longest-established outsourcing companies.

DCL (Data Conversion Laboratory). (http://www.dclab.com/ebook_production_services.asp). US-based, experienced in electronic document mark up and conversion.

Infogrid Pacific. (http://www.infogridpacific.com/igp/). Singapore-headquartered, provides conversion services and hosted technology options for digital production and distribution, aimed at small, medium and large publishers worldwide.

Innodata. (http://www.innodata.com/industries/publishing). US-headquartered. Large digital content services provider, operating across a range of industries.

iPublishCentral. (http://www.ipublishcentral.com/convertion.php). US-headquartered. Services include both conversion and hosting service for publisher ebook portals.

Ebook conversion service providers for small publishers

Several service providers will work with small and self-publisher clients on a single title or low volume production. Several of the service providers at this end of the market tie their conversion services their own distribution service, and work only from manuscripts.

The group we’ve listed here will convert from a range of print-related formats and will provide publishers with the digitized ebook files for independent distribution.

Ebook conversion service providers for small publishers

Here’s a selection of service providers who will help publishers from around the world who need to convert a single copy or small volume from a PDF or hard copy. Some also offer distribution or custom design services.

  • BookBaby. (http://www.bookbaby.com). Will produce an ebook for US$149 from a range of formats. Lulu and others offer a similar service. Note that some service providers will not accept the PDF files sent to printers, or will charge a surcharge, because they can be quite complex.
  • DCL Epub on Demand. (http://www.dclab.com/epub_on_demand.asp). This is a one-off service offering from DCL, opening its high-end ebook conversion process to small publishers and self-publishers. Each job is quoted individually through an online quotation service.
  • eBookPartnership. (http://www.ebookpartnership.com). UK-based service for self-publishers and small publishers. Works from a range of electronic and hardcopy formats, provides custom design and distribution services. Distribution partner for UK retailer Waterstones.
  • Blue Leaf Book Scanning. (http://www.blueleaf-book-scanning.com). This online service will scan your books and OCR them to produce a text-readable file that you can then convert to an ebook. They will scan using either destructive (you won’t get your book back) or non-destructive methods (your book will be returned undamaged). The service will typically cost in the region of US$50-100. You can order an EPUB or Kindle file at the same time that it’s scanned, but this isn’t recommended. The OCR process won’t produce 100% error-free results, so you should proofread the scanned document before you get it converted into an ebook.
  • Stembuck Book Scans. (http://www.stembuckbookscans.com) Offers non-destructive and destructive scanning for printed editions, and a good range of source file formats for digital upload. Well-priced with convenient and comprehensive online ordering.


This video is fairly information-dense but it’s worth a look if you’re interested in understanding some of the nitty-gritty of ebook conversion from an experienced practitioner. It’s a recording of a webinar presented by conversion service Data Conversion Laboratory. Note that it was produced to sell the value of using a professional outsourcer rather than an automated online service so there’s a bit of soft-sell but it’s high-value content.

You’ll need to put aside 20 minutes to watch this video (20 minutes more if you want to listen to the questions from the audience) and we’ve set the video to start about 12 minutes in. You can download the slides from this presentation here.


Find out more about this topic on our Digital Publishing 101 useful resources site.


Feedback Icon Feedback or suggestions for this page
(Visited 1,919 times, 3 visits today)