Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. It can read markdown and (subsets of) Textile, reStructuredText, HTML, LaTeX, MediaWiki markup, TWiki markup, Haddock markup, OPML, Emacs Org-mode, DocBook, txt2tags, EPUB and Word docx; and it can write plain text, markdown, reStructuredText, XHTML, HTML 5, LaTeX (including beamer slide shows), ConTeXt, RTF, OPML, DocBook, OpenDocument, ODT, Word docx, GNU Texinfo, MediaWiki markup, DokuWiki markup, Haddock markup, EPUB (v2 or v3), FictionBook2, Textile, groff man pages, Emacs Org-Mode, AsciiDoc, InDesign ICML, and Slidy, Slideous, DZSlides, reveal.js or S5 HTML slide shows. It can also produce PDF output on systems where LaTeX is installed.
Pandoc’s enhanced version of markdown includes syntax for footnotes,
tables, flexible ordered lists, definition lists, fenced code
blocks, superscript, subscript, strikeout, title blocks, automatic
tables of contents, embedded LaTeX math, citations, and markdown
inside HTML block elements. (These enhancements, described below
under Pandoc’s markdown, can
be disabled using the markdown_strict
input or
output format.)
In contrast to most existing tools for converting markdown to HTML, which use regex substitutions, Pandoc has a modular design: it consists of a set of readers, which parse text in a given format and produce a native representation of the document, and a set of writers, which convert this native representation into a target format. Thus, adding an input or output format requires only adding a reader or writer.
If no input-file is specified, input is read
from stdin. Otherwise, the
input-files are concatenated (with a blank
line between each) and used as input. Output goes to
stdout by default (though output to
stdout is disabled for the
odt
, docx
,
epub
, and epub3
output
formats). For output to a file, use the -o
option:
pandoc -o output.html input.txt
By default, pandoc produces a document fragment, not a standalone
document with a proper header and footer. To produce a standalone
document, use the -s
or
--standalone
flag:
pandoc -s -o output.html input.txt
For more information on how standalone documents are produced, see Templates, below.
Instead of a file, an absolute URI may be given. In this case pandoc will fetch the content using HTTP:
pandoc -f html -t markdown http://www.fsf.org
If multiple input files are given, pandoc
will
concatenate them all (with blank lines between them) before
parsing. This feature is disabled for binary input formats such as
EPUB
and docx
.
The format of the input and output can be specified explicitly
using command-line options. The input format can be specified
using the -r/--read
or
-f/--from
options, the output format using the
-w/--write
or -t/--to
options. Thus, to convert hello.txt
from
markdown to LaTeX, you could type:
pandoc -f markdown -t latex hello.txt
To convert hello.html
from html to markdown:
pandoc -f html -t markdown hello.html
Supported output formats are listed below under the
-t/--to
option. Supported input formats are
listed below under the -f/--from
option. Note
that the rst
, textile
,
latex
, and html
readers are
not complete; there are some constructs that they do not parse.
If the input or output format is not specified explicitly,
pandoc
will attempt to guess it from the
extensions of the input and output filenames. Thus, for example,
pandoc -o hello.tex hello.txt
will convert hello.txt
from markdown to LaTeX.
If no output file is specified (so that output goes to
stdout), or if the output file’s extension is
unknown, the output format will default to HTML. If no input file
is specified (so that input comes from
stdin), or if the input files’ extensions are
unknown, the input format will be assumed to be markdown unless
explicitly specified.
Pandoc uses the UTF-8 character encoding for both input and
output. If your local character encoding is not UTF-8, you should
pipe input and output through iconv
:
iconv -t utf-8 input.txt | pandoc | iconv -f utf-8
Note that in some output formats (such as HTML, LaTeX, ConTeXt,
RTF, OPML, DocBook, and Texinfo), information about the character
encoding is included in the document header, which will only be
included if you use the -s/--standalone
option.
Earlier versions of pandoc came with a program,
markdown2pdf
, that used pandoc and pdflatex to
produce a PDF. This is no longer needed, since
pandoc
can now produce pdf
output itself. To produce a PDF, simply specify an output file
with a .pdf
extension. Pandoc will create a
latex file and use pdflatex (or another engine, see
--latex-engine
) to convert it to PDF:
pandoc test.txt -o test.pdf
Production of a PDF requires that a LaTeX engine be installed (see
--latex-engine
, below), and assumes that the
following LaTeX packages are available:
amssymb
, amsmath
,
ifxetex
, ifluatex
,
listings
(if the --listings
option is used), fancyvrb
,
longtable
, booktabs
,
url
, graphicx
,
hyperref
, ulem
,
babel
(if the lang
variable
is set), fontspec
(if
xelatex
or lualatex
is used
as the LaTeX engine), xltxtra
and
xunicode
(if xelatex
is
used).
A user who wants a drop-in replacement for
Markdown.pl
may create a symbolic link to the
pandoc
executable called
hsmarkdown
. When invoked under the name
hsmarkdown
, pandoc
will
behave as if invoked with
-f markdown_strict --email-obfuscation=references
,
and all command-line options will be treated as regular arguments.
However, this approach does not work under Cygwin, due to problems
with its simulation of symbolic links.