A TeX document is a text file. Most of the text represents the content of the document, but a few characters are used specially to embed markup commands within the text. When the program TeX is called on a TeX document, it uses the markup commands in the document to create an appropriately typeset version of the document in a DVI file, which can then be printed. The TeX program, which recognizes a list of primitive commands, is invariably called with a format, which is essentially a preloaded set of definitions of some additional commands. The two most popular formats are plain TeX [29] and LaTeX [33, 36].
TeX2page understands many of the commands of
plain TeX,
LaTeX, Eplain [4], OPmac [38], and
manmac
[29, app. E].1
It
uses this understanding to convert a TeX document to
its HTML version, much the same way that TeX converts
the same document into its DVI version.
In addition to the usual smörgåsbord of sectioning, itemization and
enumeration macros,
TeX2page also recognizes some commonly used commands
that are not part of the formats but
are loaded from LaTeX packages (e.g.,
color.sty
, epsfig.sty
, graphicx.sty
, luamplib.sty
,
path.sty
,
verbatim.sty
)
or other external macro
files (e.g., btxmac.tex
, eplain.tex
, opmac.tex
,
epsf.tex
).
While TeX2page recognizes many commonly used commands,
there are plenty of commands in both the format proper
and the vast arena of macro files and packages that
TeX2page does not know. If in math mode (assuming
math is translated as text and not image
(p. 6)), TeX2page
simply types the command’s name without the leading
escape character. This is sometimes a good choice, as
in the following text (where \sin
and \cos
are not
explicitly recognized by TeX2page):
$$\sin^2 x + \cos^2 x = 1$$This becomes
|
which is clear enough.
If in non-math mode, TeX2page simply ignores commands
it doesn’t understand. This is also usually a good
thing, as commands like \leavevmode
, \/
, and
\‑
are best translated into HTML as nothing at all.
TeX2page ignores calls to include external LaTeX
packages: These are files with extension .sty
and
are loaded with \usepackage
or \input
. If the
commands offered by these packages are not already
recognized by TeX2page, they will be ignored too, and
often this is not a problem. E.g., you can use a
package for generating double columns — while this is
a great
paper-saver for your printed copy, it is generally not
important for the HTML version and so is no loss if
ignored by TeX2page.
Modern TeXs, e.g., XǝTeX [52], allow the use of
Unicode [48, 20, 31, 49] fonts,
thereby permitting a vast cornucopia of characters
such as
ॐ, ā, ∮ and ⎈
to be entered verbatim (i.e.,
without the aid of TeX commands) in the source document. TeX2page
follows modern practice in typesetting Unicode characters as themselves
(provided of course they haven’t been deliberately \catcode
’d to
mean something special).
In particular, TeX2page, like TeX,
recognizes the \char
command, followed by a TeX number n, as a
directive to output the character n in the current font — but
TeX2page assumes a Unicode font is meant.
You may need to use conditionals
to ensure that the same glyphs are typeset in both the HTML and the
DVI, if your TeX does not use a Unicode font.
TeX2page will attempt gamely to process any TeX
definitions that you use in your document or in
external macro files without the extension .sty
,
but it may be a good idea to have them explicitly
ignored, if these
macros that are print-specific, or if
having TeX2page try to parse them will cause error. A way
to have TeX2page ignore such fragments in your document
is to use
TeX conditionals, and indeed to exploit the
fact that TeX2page does not know certain TeX commands
such as \shipout
:2
\ifx\shipout\UnDeFiNeD ... for HTML only ... \else ... for DVI only ... \fiFor example, let’s say you want to have your document load the macro file
manmac.tex
, but in a way that TeX2page will ignore
it. Use:
\ifx\shipout\UnDeFiNeD \else \input manmac \fiHowever, there will also be many commands that you do not want ignored in the HTML. In such cases, while you may not be able to use the print-specific definition, you should nevertheless furnish a definition that TeX2page can handle. For instance, although it may be acceptable to ignore the print-specific macros of
manmac
, the
\bull
macro defined in manmac
should be
translated by TeX2page. The following is a possible
definition for TeX2page:
\def\bull{{\bf *}}Of course, we want this definition to be seen only by TeX2page, as we don’t want to override the original, more sophisticated
\vrule
-based definition as seen
by TeX.
We therefore make our
definition HTML-only
using a conditional:
\ifx\shipout\UnDeFiNeD % HTML only \def\bull{{\bf *}} \fi
Note that the HTML-only text continues to use TeX syntax.
To specify some of this text as raw HTML, enclose it
in \rawhtml
...
\endrawhtml
. With
\rawhtml
, we can
spruce up the
HTML-only definition of \bull
:
\ifx\shipout\UnDeFiNeD % HTML only \def\bull{{\bf \rawhtml<span style="color: hsl(0,100%,30%)">♠</span>\endrawhtml }} \fi
One can put HTML-only definitions in separate files
that are loaded just like regular TeX macro files.
Indeed, one such external file, texi2p.tex
, is used
by TeX2page to process Texinfo
documents. Texinfo [14] is another TeX
format, and files using this format \input texinfo
as the first thing they do. TeX2page takes that as a
cue to
load texi2p.tex
, which provides TeX2page-suitable
definitions for the Texinfo commands. texi2p.tex
is included in the TeX2page distribution.
One could use HTML-only and DVI-only regions to cordon off any text at all, not just macro definitions. E.g.,
The paper book is \ifx\shipout\UnDeFiNeD % HTML only an antiquated \else % DVI only a time-tested \fi technology.Use of these directives may seem to miss the point of TeX2page.
\ifx\shipout\UnDeFiNeD
violates the
principle of avoiding writing two texts, one
for HTML, the other for DVI. \rawhtml
violates the
principle of avoiding writing raw HTML at all.
\rawhtml
in particular is dangerous because it voids
the guarantee that the output pages will be valid HTML.
Nevertheless, these directives are often useful, especially when the
text will profit from exploiting the presentational differences between
HTML and DVI.
Before processing a TeX document, TeX2page will
automatically load a file
with the same basename as the TeX main file but with
extension .t2p
, if this file exists. This
is a good place to put HTML-specific definitions for
the document without making changes in the document
itself.
.t2p
files are especially valuable when HTMLizing
legacy or third-party documents without compromising
their authenticity, integrity, and timestamp.
Note that the definitions in the .t2p
file
are processed before the main file. But it often
makes sense to activate these definitions sometime
later. E.g., activating the .t2p
definitions
after the preamble in a LaTeX document allows you to
redefine the preamble macros in a manner that is
appropriate for HTML. Here is a technique for
accomplishing this:
\let\PRIMdocument\document \def\document{ ... HTML-specific definitions ... \PRIMdocument}This code, which goes in the
.t2p
file,
redefines the \document
command to include a
hook that loads some HTML-specific definitions.
Since the \document
command is called right after
the preamble, the definitions introduced by the hook
will shadow the preamble macros, as intended.
Sample .t2p
files may be found in the TeX2page
distribution.
1 TeX2page processes both plain TeX
and LaTeX commands, without the need for a format file
parameter. It can even process documents written in a
mix of plain TeX and LaTeX. This is not an uncommon
scenario, with LaTeX users frequently using plain TeX
commands, and plain TeX users frequently implementing
their own version of sectioning and other commands
using the LaTeX names. In the few cases where the same
command name (e.g., \footnote
) is used in both
formats but with differing behavior, TeX2page will
choose the correct behavior based on which format it
thinks the overall document is in. The plain TeX and
LaTeX document structures are sufficiently different
(as human readers can readily testify by reading just a
few opening lines) to allow this disambiguation.
2 Here, \UnDeFiNeD
is chosen because it is
a control sequence that is (presumably) undefined. If perchance
you defined it, replace it with something else, e.g.,
\forHTMLonly
, \ForTheWeb
. If using a modern PDF-producing
TeX, you can use \ifdefined
, and thus avoid having to come up
with arbitrary undefined control sequences. Note that with
\ifdefined\shipout
it’s the else-branch that’s HTML-only.