Home Page

Tips page

University Page

Programming

Debian & Linux

Some works

About me

Del.icio.us Bookmarks

BOINC Combined Statistics

Site Statistics

Contact me sending an e-mail (antispam defense activated)

Statistiche sito,contatore visite, counter web invisibile

debian

hacker emblem

blogger

GeoURL

View Sandro Tosi's profile on LinkedIn

This is my Google PageRank

Convert LaTeX Documents to HTML

Convert LaTeX Documents to HTML    

 Sandro Tosi, 26 September 2007


This article is intended to show some Linux command line tools available on Debian (but they may be present in other distribution as well, or through their websites) to convert a LaTeX document in HTML format.

This comparison originates as I would let my thesis be available in HTML format (you can find it here). It is a rather complex LaTeX document, and I really like to have it as similar as possible to ps/pdf format.

It's focused on these aspects: user-friendliness, conversion quality (how much adhere to LaTeX command output), if it can write the output results in a different directory than the current one (so our LaTeX source dir is not filled with html and images files), my thesis conversion quality.

The tools under examination are: latex2html, tex4ht, hevea, tth and hyperlatex. Some conventions: <doc>.tex is the main document file (it can contain referral to other latex/bibtex files), <doc> is the same file without extension; to have those tools work correctly, you need to compile the doc first, executing as many times is needed, latex and bibtex. Ok, let's start now.

latex2html

Description: It is a really good program, wide used around the web, and I have used it sometimes too. I can only suggest this if your document is rather simple.

How to call it: invoke it is as simple as call

$ latex2html <doc>.tex

Output: the tool generates an HTML file for each chapter/section/subsections of the doc; every image is converted and every math writing is converted in an image (inserted in the HTML pages).

Destination dir: the command allow to set the destination dir using -dir <dest_dir> switch at the command line.

Thesis conversion: it is ugly, many images are not converted (and they are used to render every math writings, and this is a math thesis...), results image are not shows (maybe even not converted at all) since they are embedded in a custom table.

Conclusion: use it, it's good, really; but do that on simple formatting articles, much text, some images, and you'll be satisfied. For more complex works, using hardformatting, nested tables, custom commands, you have to look further.

tex4ht

Description: it is rather a suite of tools than just one. The package contains many programs aim to convert LaTeX documents into other formats. Since our intent is to convert to HTML we're going to see htlatex, that converts a LaTeX document into HTML.

How to call it: to call the tool just exec

$ htlatex <doc>.tex

Output: the output is a single file HTML, named <doc>.html, that contains the main document; other HTML files are generated for footnotes; images are generated for math writings and real image.

Destination dir: you can specify a different output  directory passing this exact parameter string: "" "" -d<dest_dir>; the first two parameters set to empty string are needed since htlatex calls 3 subscripts, and only the latter (t4ht, for your knowledge) will create output files, and to it you have to specify the destination directory (this may change in the future).

Thesis conversion: except for some <ENTER> keys I've had to press during conversion, I have to admit the result are quite impressive: they are very similar to print-format ones: all images are converted, quite the same text formatting.

Conclusion: If you have a really complex document and you'd like to have a big single-html-file of this, tex4ht is the way to go. Sometimes, a single file may be too big to handle (>250kb for the thesis, not counting the images).

hevea

Description: a great conversion tool, that is really strong even in hard situations, fast and very accurate.

How to call it: it's a kind of two-step-process: first execute

$ hevea <doc>.tex

then the command itself will require you to execute this command to generate images (I suggest to use PNG format, you know it's better.. :) ):

$ imagen -png <doc>

Output: it generates a single <doc>.html files (and separated html files for footnotes), but it can be separated using 

$ hacha -tocbis <doc>.html

command; the output is quite similar to latex2html: an index.html files with a short toc (generated by -tocbis option) and a HTML file for each chapter. Images are generated for some math code (examples are sums and square roots) and for real images. Math writings are not optimal (but understandable): hevea tries to convert everything to text (I think for web performance reasons) but this way matrices visualization and other writing are rendered a "little bit strange"; -mathml option (with is experimental) won't solve it.

Destination dir: there is no parameter to specify a different output directory; that's a big limitation.

Thesis conversion: It is really really good. It seems the exact web version of printed one.

Conclusion: Even if you can't specify alternative output directory, and math codes are written some king of weird, it is a really good tools. I suggest to give it a try: I'm sure you're gonna use its output for your website (as I did for my thesis).

tth

Description: even tth is a suite of components to convert LaTeX (and its ps/pdf output) in html or gif format.

How to call it: simply execute

$ tth -e1 <doc>.tex

where -e1 option is needed to convert images in png format, and to have them included in the doc (otherwise they are not generated).

Output: it generates a single <doc.>html file, some images are converted other are completely absent; math writing quality is really poor.

Destination dir: there is no option to write file in a different directory.

Thesis conversion: Unusable. Many (if not all) math writing are rendered in text format, completely messing up the equations, matrices, etc. Almost every image is not converted and so they are not shown in the html doc.

Conclusion: I've tried only on my thesis, and the results are poor. It could be a good tools, but I image only on simple papers.

hyperlatex

Description: I've left this at last since this is not a program, but a way to write LaTeX document using a dialect, hyperlatex. This way, the document will be converted easily to HTML, but the printed results are still at the same high level guaranteed by LaTeX.

Conclusion: More than convert a document, here you have to start writing using hyperlatex. So you can give it a try only if you have not already got a document to convert, but you're about to start a brand new and you already know you'll want it in both printed and HTML format.