Scott Marlowe

Author • Engineer • Technologist

Blog

eBook File Formats

There are a lot of eBook file formats: TXT, HTML, AZW, DOC/DOCX, OPF, TR2/3, ARG, DTB, FB2, XML, CHM, PDF, PS, DJVU, LIT, PDB, DNL… ok, I think I'll stop now. It's obvious that while one file format might be a nice ideal, it is anything but reality. Not everyone uses the same software, and there's no such thing as a universal e-book file format that all market players have adopted (EPUB stands out, but not every eBook retailer uses it).

At its simplest form, e-books are just text files. But text files are too simple. They don't contain the characteristics necessary for an e-book to rival a printed book in appearance. Also, TXT files do not support DRM.

E-book readers—both software and hardware—are a topic unto itself. For this post, I therefore want to focus just on the file formats that these (software or hardware-based) readers support. Also, I'll only focus on those formats I feel are the most relevant. It's not very realistic, IMO, for someone to read an e-book of any length in TXT format or even HTML. Other formats, such as PKG (which was a file format for reading e-books on an Apple Newton), are outdated enough to not garner further attention.

So, here are the formats and a bit of information about each.

AZW

Kindle-icon

AZW is the file format used by the Amazon Kindle e-reader. It is proprietary to Amazon and is DRM protected. The best way to both convert a file to this format and publish on Amazon's Kindle store is to use their Digital Text Platform site.

Their recommendation for having a successful conversion:

The preferred format for uploading content is as a single HTML file. To include images, provide a ZIP file that includes the images as well as the HTML file that refers to them (check the formatting guides to find out how to link to images from HTML). The HTML and image files all have to be in the same folder inside the zip file.

I've gone through this process to publish my novel, The Hall of the Wood, on the Kindle store; it is a pretty painless process.

Note: A lot of people/reviewers think Kindle only supports AZW. This isn't true. Kindle also supports (natively) TXT, PDF, Audible (Audible Enhanced (AA, AAX)), MP3, unprotected MOBI and (through conversion) HTML and DOC.

PDB

palmpdb

PDB is DRM-protected format advocated by Palm Digital Media. It stands for Palm Database, and originally was intended as a file format meant to be read on the Palm handheld device. It seems from looking around that many retailers support this format and that it isn't necessarily required to have a Palm handheld to read files in this format as software for PC's or Mac's is available. Also, the format is supported on many other handheld devices.

PDF

pdf_icon_large

PDF stands for Portable Document File. It was established by Adobe in hopes of creating a universal file format to promote the ready exchange of data, specifically document files. DRM-free PDF's can be read by the free Adobe Reader. PDF's protected by DRM can be read by Adobe Digital Editions, which has the ability to allow or deny access to a downloaded PDF depending on the conditions under which the file was obtained.

If an e-book was outright purchased, you should be good to go, though you will have to read the PDF using Digital Editions and will be further restricted from saving or printing the e-book. On the other hand, if you checked an e-book out from an online library and that e-book contains DRM, chances are the e-book will "expire" after the loan period is up, at which time you will no longer be able to view the e-book.

PDF documents can be created by any number of freely available software converters. My preferred method of conversion is to use the Microsoft Save as PDF or XPS add-in for Microsoft  Word 2007. Of course, there's always Adobe Acrobat Professional, too.

ODF

ODF_glassy_100
OOo_150_ODT_Icon

OpenDocument Format is an XML-based file format used to represent spreadsheets, presentations, word processing documents, and more. While ODF has emerged as an industry standard, the specification having been ratified by over 600 technology companies (including Microsoft and Adobe), it is of some note that while applications such as Microsoft Office support ODF, that suite also still defaults to its own proprietary file formats. ODF is, however, the default file format for OpenOffice, a popular open source alternative to Microsoft Office.

ODT, or OpenDocument Text, is the word processing specific version of the ODF file format standard. Similarly, there are presentation (ODP), spreadsheet (ODS), and other formats.

RTF

rtf_icon

The Rich Text Format was developed by Microsoft in the 1980's. Not surprising, it is an 8-bit based format, and while it can address larger character sets, it is through means that relegate the format to mostly a legacy role. Still, the format is quite prolific; converting to RTF is supported by most word processing and other applications.

DOC/DOCX

WordIcon

The default file format supported by Microsoft Word. With Word 2007, Microsoft introduced the DOCX format, which is billed as an open, XML format that, unfortunately, has not been as widely adopted as Microsoft might have hoped. One of the nice things about the DOCX format is that it results in much leaner files. However, it is not backward compatible with previous versions of Word.

EPUB

epub_icon

EPUB is an e-book specific format engineered by the International Digital Publishing Forum (IDPF) and intended to replace the Open eBook (OEB) standard. EPUB includes optional support for DRM. The standard is supported by the Barnes & Noble nook, Sony Reader, and Apple's iPhone as well as other devices.

As far as converting a document to the EPUB format, it looks like there are several options: BookGlutton hosts an HTML-to-EPUB file converter, Google Code contains a software library called epub-tools which looks suitable for batch style conversion of files, and LexCycle has something called Stanza which looks to be a desktop application. I'll have to give each of them a whirl to see which is the best option.

PRC/MOBI

Blackberry-icon

The PRC/MOBI file format is based on the Open eBook (OEB) standard (which I discovered was superseded by the EPUB standard; see above), and is considered one of the most prolific e-book file formats for mobile devices. The biggest proponent of this format is Mobipocket.

Mobipocket offers both reader and publisher software, both free. Mobipocket Reader will run on PC's as well as a number of handheld devices. There are two ways to use Mobipocket Creator to author e-books: use the application to create the e-book and then add content and design from there or, the more practical approach, import Word, text, or PDF documents.

The PRC/MOBI format does, of course, support DRM.

BBeB

(LRX/LRF)

320px-Reader2

BBeB, or Broadband eBook, is Sony's proprietary file format for e-books, as if we needed yet another one. It comes in two varieties: LRX for encrypted (DRM) e-books and LRF for unencrypted e-books.

Sony has their own e-book store where one can download e-books in these formats. The newest version of the Sony Reader is a device widely expected to give Amazon's Kindle a run for the money. In order to read books in the BBeB format, you will need a Sony Reader, much like the AZW format is married to the Kindle.

However, Sony opened the Reader up so that it also supports the EPUB format. This is a good thing, and leaves the Kindle as virtually the only device that locks its users into a proprietary format.

I haven't yet found a viable method by which to publish e-books in this format.

Two options have come to light for converting from a more standard format to BBeB:

1.) As ZenEngineer points out in the comments below, there is a freeware program called Calibre that will perform the conversion.

2.) Also, there is the bbebinder open-source project hosted on Google Code which converts HTML and TXT files to the BBeB format.

LIT

microsoft-reader_t

This is a Microsoft-specific file format whose time I can't help but wonder may be at an end. LIT files are readable only on Microsoft Reader, and while there are versions of the software for PC's and handhelds, the major players in those areas (Amazon, Sony, Apple) have their own proprietary formats.

Creation of LIT files seems a bit problematic as well. There is a Read in Microsoft Reader add-in for Microsoft Word 2000 and higher, but "higher" here does not include Word 2007. That kind of tells me the format is being abandoned by Microsoft.

References/Further Reading


Find out when the next Alchemancer book and other stories come out by joining my mailing list!

Pingbacks and trackbacks (3)+