2  TeX​ ​commands

A TeX​ ​document is a text file. Most of the text represents the content of the document, but a few characters are used specially to embed markup commands within the text. When the program TeX​ ​is called on a TeX document, it uses the markup commands in the document to create an appropriately typeset version of the document in a DVI file, which can then be printed. The TeX​ ​program, which recognizes a list of primitive commands, is invariably called with a format, which is essentially a preloaded set of definitions of some additional commands. The two most popular formats are plain TeX [29] and LaTeX [3336].

TeX2page understands many of the commands of plain TeX, LaTeX, Eplain [4], OPmac [38], and manmac [29, app. E].1 It uses this understanding to convert a TeX​ ​document to its HTML version, much the same way that TeX​ ​converts the same document into its DVI version. In addition to the usual smörgåsbord of sectioning, itemization and enumeration macros, TeX2page also recognizes some commonly used commands that are not part of the formats but are loaded from LaTeX​ ​packages (e.g., color.sty, epsfig.sty, graphicx.sty, luamplib.sty, path.sty, verbatim.sty) or other external macro files (e.g., btxmac.tex, eplain.tex, opmac.tex, epsf.tex).

While TeX2page recognizes many commonly used commands, there are plenty of commands in both the format proper and the vast arena of macro files and packages that TeX2page does not know. If in math mode (assuming math is translated as text and not image (p. 6)), TeX2page simply types the command’s name without the leading escape character. This is sometimes a good choice, as in the following text (where \sin and \cos are not explicitly recognized by TeX2page):

$$\sin^2 x + \cos^2 x = 1$$
This becomes

sin 2 x + cos 2 x = 1

which is clear enough.

If in non-math mode, TeX2page simply ignores commands it doesn’t understand. This is also usually a good thing, as commands like \leavevmode, \/, and \‑ are best translated into HTML as nothing at all.

TeX2page ignores calls to include external LaTeX packages: These are files with extension .sty and are loaded with \usepackage or \input. If the commands offered by these packages are not already recognized by TeX2page, they will be ignored too, and often this is not a problem. E.g., you can use a package for generating double columns — while this is a great paper-saver for your printed copy, it is generally not important for the HTML version and so is no loss if ignored by TeX2page.

Modern TeXs, e.g., XǝTeX [52], allow the use of Unicode [48203149] fonts, thereby permitting a vast cornucopia of characters such as ॐ, ā, ∮​ ​and ⎈ to be entered verbatim (i.e., without the aid of TeX​ ​commands) in the source document. TeX2page follows modern practice in typesetting Unicode characters as themselves (provided of course they haven’t been deliberately \catcode’d to mean something special).

In particular, TeX2page, like TeX, recognizes the \char command, followed by a TeX​ ​number n, as a directive to output the character n in the current font — but TeX2page assumes a Unicode font is meant. You may need to use conditionals to ensure that the same glyphs are typeset in both the HTML and the DVI, if your TeX​ ​does not use a Unicode font.

Defining TeX-only and HTML-only text regions

TeX2page will attempt gamely to process any TeX definitions that you use in your document or in external macro files without the extension .sty, but it may be a good idea to have them explicitly ignored, if these macros that are print-specific, or if having TeX2page try to parse them will cause error. A way to have TeX2page ignore such fragments in your document is to use TeX​ ​conditionals, and indeed to exploit the fact that TeX2page does not know certain TeX​ ​commands such as \shipout:2

\ifx\shipout\UnDeFiNeD
  ...  for HTML only ...
\else
  ...  for DVI only ...
\fi
For example, let’s say you want to have your document load the macro file manmac.tex, but in a way that TeX2page will ignore it. Use:

\ifx\shipout\UnDeFiNeD
\else
  \input manmac
\fi
However, there will also be many commands that you do not want ignored in the HTML. In such cases, while you may not be able to use the print-specific definition, you should nevertheless furnish a definition that TeX2page can handle. For instance, although it may be acceptable to ignore the print-specific macros of manmac, the \bull macro defined in manmac should be translated by TeX2page. The following is a possible definition for TeX2page:

\def\bull{{\bf *}}
Of course, we want this definition to be seen only by TeX2page, as we don’t want to override the original, more sophisticated \vrule-based definition as seen by TeX. We therefore make our definition HTML-only using a conditional:

\ifx\shipout\UnDeFiNeD % HTML only
  \def\bull{{\bf *}}
\fi

Note that the HTML-only text continues to use TeX​ ​syntax. To specify some of this text as raw HTML, enclose it in \rawhtml ... \endrawhtml. With \rawhtml, we can spruce up the HTML-only definition of \bull:

\ifx\shipout\UnDeFiNeD % HTML only
  \def\bull{{\bf
  \rawhtml<span style="color: hsl(0,100%,30%)">&spades;</span>\endrawhtml
  }}
\fi

One can put HTML-only definitions in separate files that are loaded just like regular TeX​ ​macro files. Indeed, one such external file, texi2p.tex, is used by TeX2page to process Texinfo documents. Texinfo [14] is another TeX format, and files using this format \input texinfo as the first thing they do. TeX2page takes that as a cue to load texi2p.tex, which provides TeX2page-suitable definitions for the Texinfo commands. texi2p.tex is included in the TeX2page distribution.

Paper and screen

One could use HTML-only and DVI-only regions to cordon off any text at all, not just macro definitions. E.g.,

The paper book is
\ifx\shipout\UnDeFiNeD % HTML only
an antiquated
\else % DVI only
a time-tested
\fi
technology.
Use of these directives may seem to miss the point of TeX2page. \ifx\shipout\UnDeFiNeD violates the principle of avoiding writing two texts, one for HTML, the other for DVI. \rawhtml violates the principle of avoiding writing raw HTML at all. \rawhtml in particular is dangerous because it voids the guarantee that the output pages will be valid HTML. Nevertheless, these directives are often useful, especially when the text will profit from exploiting the presentational differences between HTML and DVI.

The .t2p file

Before processing a TeX​ ​document, TeX2page will automatically load a file with the same basename as the TeX​ ​main file but with extension .t2p, if this file exists. This is a good place to put HTML-specific definitions for the document without making changes in the document itself.

.t2p files are especially valuable when HTMLizing legacy or third-party documents without compromising their authenticity, integrity, and timestamp.

Note that the definitions in the .t2p file are processed before the main file. But it often makes sense to activate these definitions sometime later. E.g., activating the .t2p definitions after the preamble in a LaTeX​ ​document allows you to redefine the preamble macros in a manner that is appropriate for HTML. Here is a technique for accomplishing this:

\let\PRIMdocument\document

\def\document{
  ...  HTML-specific definitions ...
  \PRIMdocument}
This code, which goes in the .t2p file, redefines the \document command to include a hook that loads some HTML-specific definitions. Since the \document command is called right after the preamble, the definitions introduced by the hook will shadow the preamble macros, as intended.

Sample .t2p files may be found in the TeX2page distribution.


1 TeX2page processes both plain TeX and LaTeX​ ​commands, without the need for a format file parameter. It can even process documents written in a mix of plain TeX​ ​and LaTeX. This is not an uncommon scenario, with LaTeX​ ​users frequently using plain TeX commands, and plain TeX​ ​users frequently implementing their own version of sectioning and other commands using the LaTeX​ ​names. In the few cases where the same command name (e.g., \footnote) is used in both formats but with differing behavior, TeX2page will choose the correct behavior based on which format it thinks the overall document is in. The plain TeX​ ​and LaTeX​ ​document structures are sufficiently different (as human readers can readily testify by reading just a few opening lines) to allow this disambiguation.

2 Here, \UnDeFiNeD is chosen because it is a control sequence that is (presumably) undefined. If perchance you defined it, replace it with something else, e.g., \forHTMLonly, \ForTheWeb. If using a modern PDF-producing TeX, you can use \ifdefined, and thus avoid having to come up with arbitrary undefined control sequences. Note that with \ifdefined\shipout it’s the else-branch that’s HTML-only.