Viewing CHM files, and converting CHM to HTML or PDF files in (Ubuntu) Linux

Posted by on May 25, 2008 in Linux, Ubuntu3 comments

CHM IconAs most people know, CHM (Microsoft Compiled HTML Help) is a proprietary format, and not supported by default in Linux. Thankfully, there are several options available for viewing CHM files, and even converting them to another format – such as HTML and PDF.

Viewing CHM Files

The simplest method for dealing with CHM files is to download and install a CHM Viewer, for example gnochm or xchm, which under Ubuntu/Debian can be installed via a Terminal:

sudo apt-get install gnochm

or

sudo apt-get install xchm

or via the Synaptic Package Manager, by searching for “gnochm”, or “xchm”. Once installed, CHM files will automatically open within the CHM Viewer.

Converting CHM Files

My prefered method for dealing with CHM files, is to convert them to a more universal format, such as HTML, or even PDF, and there are a couple of ways, and several different tools for accomplishing this.

Conversion Method 1: CHM -> HTML (-> PDF)

HTML IconFirstly, it is possible to simply decompile the CHM file into component HTML files, which can be opened in any web browser. These HTML files may then optionally be transformed into a PDF document. In order to do this, two main packages (with dependencies) need to be installed – chmlib, and htmldoc:

sudo apt-get install libchm-bin htmldoc

The first part of the process calls upon chmlib to essentially decompile the CHM file, and save the new files to a specified directory, for example:

extract_chmLib my_chm_book.chm htmloutputdir

This will pull apart the CHM file, and store all the new HTML files within the “htmloutputdir” directory (within a sub-directory called “final”). If desired, htmldoc can be called upon to convert the HTML files into a single PDF document. Running

htmldoc &

from the Terminal, opens up the htmldoc GUI from which the HTML files can be selected for input, the output formatted, and PDF document generated. The htmldoc website has extensive documentation which covers this process, but for converting to PDF, I find the next method much easier!

Conversion Method 2: CHM -> PDF

PDF IconA tool written in Python, called chm2pdf, essentially cuts out (for the user) the intermediary processes above. It sits as a layer on top of chmlib and htmldoc, automating most of the conversion work, and as such also requires chmlib, htmldoc and additionally pychm (Python binding for chmlib) in order to execute. chm2pdf can be downloaded from the website and compiled/installed manually or, in Ubuntu, installed from the repositories. To install it, and the required additional applications, in a terminal run:

sudo apt-get install libchm-bin htmldoc python-chm chm2pdf

chm2pdf is a command line tool, and in most cases, the default options work for me (outputting an A4 PDF document, with ToC, images, links etc.), and is as simple as running:

chm2pdf --webpage my_chm_book.chm

from the directory where the CHM file is located. This converts the file (to HTML – these files are stored in the /tmp directory), and generates a PDF document with the same name, with the ToC, Index, links, images etc. all intact. Running either:

man chm2pdf

or

chm2pdf --help | less

will output the manual and help pages, both of which contain a wealth of information and command options to perfect and tweak the CHM to PDF conversion process.

Tags: , , , ,