From: doc-sig-admin@python.org on behalf of David Goodger [dgoodger@bigfoot.com] Sent: 25 November 2000 04:15 To: doc-sig@python.org Subject: [Doc-SIG] reStructuredText: Revised Structured Text Specification ========================================================= reStructuredText: Revised Structured Text Specification ========================================================= David Goodger (mailto:dgoodger@bigfoot.com) 2000-11-24 This revised specification is an attempt to refine, standardize, and extend the original Structured Text from Digital Creations' Zope (http://www.zope.org). Structured Text is plain text (i.e., text without tags, control characters, or other embedded formatting information) that uses simple, intuitive, and language-independent constructs to indicate the structure of a document. These constructs are equally easy to read in raw and processed forms. This document is itself an example of Structured Text (raw, if you are reading the text file, or processed, if you are reading an HTML page, for example). Simple symbology is used to indicate special constructs, such as headings, bullet lists, and emphasis. The symbology used is as minimal and unobtrusive as possible. Less often-used constructs and extensions to the basic structured text format may have more elaborate markup. A Structured Text document is made up of body elements, and optionally structured into sections. Sections contain body elements and/or subsections. Body elements consist of: - paragraphs, which contain text and optional inline markup; - lists (enumerated, bullet, descriptive, and option), which contain list items, whose items in turn contain body elements; - code blocks, which contain preformatted text only (spaces and linebreaks are preserved); - block quotes, which contain body elements; and - tables, whose cells contain body elements. Blank lines are used to separate paragraphs and other elements. Blank lines may be omitted when the markup makes element separation unambiguous. Tabs will be replaced by spaces; tab stops are at every 8th column. Indentation is used to indicate, and is only significant in indicating: - nesting within list items, such as nested lists, or multiple paragraphs within a list item, - block quotes, and - the extent of code blocks. Paragraphs may contain inline markup. Inline markup may not be nested. - inline code - strong - emphasis - hyperlinks: - standalone (absolute URLs) - indirect (absolute and relative URLs) - internal (cross-links within a document) - footnotes Below is a block diagram overview of the hierarchy of element types in Structured Text. Elements 'may contain' other elements below them. Element types in parentheses indicate recursive relationships: sections may contain (sub)sections, tables contain further body elements, etc. Footnotes, comments, directives, and hyperlink targets (all starting with '.. ' in column 1) are independent of the hierarchy and may appear at any point. :: +-------------+ +----------------------------------------+ | comments, | | +-------+ | | directives, | | sections (begins with one | title |) | +-----------| hyperlink | | +-------+ | | footnotes | targets | |------------+---------------------------+---+-----------+-------------| | (sections) | body elements: | text | +------------| code | | block | | para- | block | | blocks | lists | quotes | tables | graphs |-------------+ +--------+-------+--------+--------+--------| | (body elements) | inline | +-------------------------| markup | +--------+ Syntax Details ============== Escaping Mechanism ------------------ The character set available in plain text documents is limited. Every non-alphanumeric character has been overloaded with functionality: ordinary written text, mathematics, computer programming, regular expressions, Internet conventions. No matter what characters are used for markup, they will already have multiple meanings in written text. Therefore they *will* appear in text **without being intended as markup**. A serious markup system requires an escaping mechanism to override the default meaning of the characters used for the markup. In Structured Text, we will use the (almost) universal escaping character, the backslash. A backslash followed by any character escapes the character. The escaped character represents the character itself, and is prevented from playing a role in any markup interpretation. The backslash is removed from the output. A literal backslash is represented by two backslashes in a row. Comments and Directives ----------------------- A comment/directive block is a text block: - whose first line begins with '.. ' in column 1, - whose second and subsequent lines are indented relative to the first, and - which ends with a blank or unindented line. This syntax is used for comments, footnotes, indirect hyperlinks, internal hyperlinks, directives, and as an extension mechanism. Footnotes and hyperlinks are described in the section 'Hyperlinks' below. Comments :::::::: Arbitrary text may follow the comment start and will be removed from the processed output. The only restriction on comments is that they not use the same syntax as directives or hyperlinks. It is recommended to put a blank line after a comment, to ensure that subsequent indented text blocks are not accidentally commented out. Directives :::::::::: Directives are indicated by a text block beginning with '.. ', followed by a single word (the directive name, [a-zA-Z][a-zA-Z0-9_-]*), two colons, and whitespace. (Two colons are used to avoid clashes with common comment text like '.. Warning: modify at your own risk!'.) Directive names are case-insensitive. Actions taken in response to directives and the interpretation of data in the directive block or subsequent text block(s) are directive- and implementation-dependent. No directives are defined by the core Structured Text specification. Directives can be used as an extension mechanism for Structured Text. For example, a proposal was made in the Python Doc-SIG for keyword-tagged values. This could be accomplished as follows:: .. keywords:: Author: Anne Elk (Miss) Revision: 1 If an implementation of Structured Text doesn't recognize a directive, the entire directive block will simply be treated as a comment. Any subsequent text blocks will be processed as usual. The implentation may also emit a warning. Section Structure ----------------- Sections are identified through their titles. Titles are marked up with 'underlines' below the title text (and, in some cases, 'overlines' above the title). An underline/overline is a line of non-alphanumeric characters that begins in column 1 and extends at least as far as the title text. In the case of both overlines and underlines, their lengths and characters must match. There may be any number of levels of section titles. Rather than imposing a fixed number and order of section title styles, the order enforced will be the order as encountered. The first style encountered will be an outermost title (like HTML H1), the second style will be a subtitle, the third will be a subsubtitle, and so on. Below are examples of section titles. The first five styles are recommended:: =============== Section Title =============== Section Title ============= Section Title ------------- Section Title ::::::::::::: Section Title ............. Section Title ************* Section Title +++++++++++++ Section Title ~~~~~~~~~~~~~ Section Title ^^^^^^^^^^^^^ Note that the first example title above (overline & underline of '=') is slightly inset, but it doesn't have to be; this is merely aesthetic and not significant. A blank line after a title is optional. All text blocks up to the next title are included in a section (or subsection, etc.). All section/title types need not be used, nor must any specific section/title type be used. However, a document must be consistent in its use of sections/titles: once established, section title types must be used in the outer-to-inner order. Body Elements ------------- Code Blocks ::::::::::: A paragraph which which ends with two colons ('::') signifies that all following **indented** text blocks are code blocks. No further markup processing is done within a code block. It is left as-is, and typically rendered in a monospaced font:: This is a typical paragraph. A code block follows:: for a in [5,4,3,2,1]: # this is some program code, formatted as-is print a print "it's..." # a code block continues until the indentation ends This text has returned to the indentation of the first paragraph, is outside of the code block, and therefore treated as an ordinary paragraph. When '::' is immediately preceeded by whitespace, both colons will be removed from the output. When text immediately preceeds the '::', *one* colon will be removed from the output, leaving only one (i.e., '::' will be replaced by ':'). When '::' is alone on a line, it will be completely removed from the output; no empty paragraph will remain. The minimum leading whitespace will be removed from the code block. In the example code block above, only the second line ('` print a`') will keep its leading whitespace. Block Quotes :::::::::::: A text block that is indented relative to the preceeding text, without markup indicating it to be a code block, is a block quote. All markup processing (for body elements and inline markup) continues within the block quote:: This is an ordinary paragraph, introducing a quote: "It is my business to know things. That is my trade." --Sir Arthur Conan Doyle Bullet Lists :::::::::::: A text block which begins with a '-', '*', or '+', followed by whitespace, is treated as a bullet list (unordered list) item. For example:: - This is the first bullet list item. - This is the first paragraph in the second item in the list. This is the second paragraph in the second item in the list. The blank line above this paragraph is required. - This is a sublist. A code block needs to be indented even more:: print "lemon curry?" - This is the third item of the main list. - This is the fourth item of the main list (no blank line above). The second line of this item is not indented relative to the bullet, which precludes it from having a second paragraph. - A fifth item, whose second line is indented only one space relative to the bullet. A second paragraph for the fifth item. This paragraph is not part of the list. Blank lines before bullet list items are optional; blank lines are only required to separate list items from other types of text blocks, as noted in the example. The indentation of bullet list items takes the bullet itself into account. In the second list item above: - The second paragraph is indented relative to the bullet. The second paragraph must line up with the left edge of the first. - The bullet of the sublist is indented relative to the bullet of the outer list's item. Enumerated Lists :::::::::::::::: A text block which begins with a sequence label is treated as an enumerated list (ordered list) element. Sequence labels can be:: 1. A sequence of digits followed by a period ('1.'), a colon ('1:'), a dash ('1-'), a space and a dash ('1 -'), a right-parenthesis ('1)'), or surrounded with parentheses ('(1)'). B. A single letter (uppercase or lowercase) followed by a period etc. III. A roman numeral (uppercase or lowercase) followed by a period etc. III.a. A sequence of enumerations, separated by periods and ending with a period etc. (III)(b) A sequence of enumerations, each enclosed in paretheses. III(c) A mixture of styles. Nested enumerated lists must be created with indentation (as in the example above). Enumerators are not interpreted. Descriptive Lists ::::::::::::::::: A text block with a first line that contains some text, followed by whitespace, '--', and some more whitespace, is treated as a descriptive list element. The '--' must be on the first line. The leading text is the term, and the text after the '--' is the description:: Type A -- The description may begin immediately after the '--', as long as the description is only one paragraph. Type B -- The description may begin immediately after the '--', and may contain multiple paragraphs if second and subsequent lines are indented relative to the left edge of the first line. Description paragraph 2, indented to the same level. Type C -- Type C is a variation of Type B. Description paragraph 2, indented to the same level. Type D -- The description may also begin below, indented. This is useful for multiple paragraphs, or arbitrary text blocks (lists, etc.). Description paragraph 2, indented to the same level. For type A descriptive list items, the second line of the description paragraph is checked for ' -- '. If present, it is assumed that it is the start of another list item. Example:: Item One -- Description. Item Two -- Description. Option Lists :::::::::::: .. XXX perhaps this should be left as an extension? Option lists are two-column lists of command-line options and descriptions. There are two types of options: short and long. Short options consist of one dash, an option letter, and an optional argument placeholder. Long options consist of two dashes, an option word, and possibly an argument placeholder. There must be at least two spaces between the option and the description. The option acts as a bullet, and description begins a new text block which may contain multiple paragraphs and body elements. For example:: -a Output all -b Output both (this description is quite long) -c arg Output just arg. --long Output all day long. Tables :::::: Tables are described with a visual outline made up of the characters '-', '|', and '+'. The hyphen ('-') is used for horizontal lines (row separators). The vertical bar ('|') is used for vertical lines (column separators). The plus sign ('+') is used for intersections of horizontal and vertical lines. Each cell contains body elements, and may have multiple paragraphs, lists, etc. Example: +------------+------------+---------------------------+ | Column 1 | Column 2 | Column 3 & 4 span (Row 1) | +------------+------------+------------+--------------+ | Column 1 & 2 span | Column 3 | - Column 4 | +------------+------------+------------+ - Row 2 & 3 | | 1 | 2 | 3 | - span | +------------+------------+------------+--------------+ Paragraphs :::::::::: Paragraphs are what's left when all other body element markup is exhausted. They consist of blocks of text with no external markup indicating any other body element. Blank lines separate paragraphs from each other and from other body elements. However, when unambiguous due to markup, blank lines may be omitted. An alternate style of indented-first-line paragraphs is as follows: This is a paragraph with an indented first line. Here is a second such paragraph. Inline Markup ------------- Inline markup is the markup of text within a text block. Inline markup cannot be nested. Inline Code ::::::::::: Text enclosed by backquotes (with whitespace or punctuation to the left of the first backquote and to the right of the second backquote) is treated as `example code`. Inline code is typically set in a monospaced typeface. Strong :::::: Text surrounded by '**' characters (with whitespace or puctuation to the left and to the right) is **emphasized strongly**, typically displayed as boldface. Emphasis :::::::: Text surrounded by '*' characters (with whitespace or puctuation to the left and to the right) is *emphasized*, typically displayed as italics. Hyperlinks :::::::::: Standalone Hyperlinks ..................... An absolute URL within a text block is treated as a general external hyperlink with the URL itself as the link's text. For example, :: See http://www.python.org for info. would be marked up in HTML as:: See http://www.python.org for info. Indirect Hyperlinks ................... Indirect hyperlinks consist of two parts. In the text body, there is a source link, a name with a trailing underscore:: See the Python_ home page for info. Somewhere else in the document is a target link: two dots, a space, an underscore, the same name used for the source link (no trailing underscore), a colon, whitespace, and a URL (relative or absolute):: .. _Python: http://www.python.org Combined, this is expressed in HTML as:: See the Python home page for info. Phrase-links (a hyperlink whose name is a phrase) can be expressed by enclosing the phrase in brackets and treating the bracketed text as a link name:: Want to learn about [my favorite programming language]_? .. _my favorite programming language: http://www.python.org If a phrase-link name contains any colons, they must be backslash-escaped in the link target. Internal Hyperlinks ................... Internal hyperlinks connect one point to another within a document. They are identical to indirect hyperlinks except that there is no URL in the target link. For example:: .. _target: This is the target point. Clicking on this internal hyperlink will take us back to the target_. Footnotes ......... Footnotes are like internal hyperlinks with text in the targets. Footnotes consist of two parts. In the text body there is a source link: a bracketed name (an alphanumeric string with no spaces), with a trailing underscore:: Please refer to the fine manual [GVR2000]_. Somewhere else in the document (not necessarily at the end) is a target link: two dots, a space, an underscore, the same bracketed name used for the source link (no trailing underscore or colon), whitespace, and the footnote text:: .. _[GVR2000] Python Documentation, van Rossum, Drake, et al., http://www.python.org/doc/ Syntax Diagrams =============== Paragraphs may be separated by a blank line:: +------------------------------+ | paragraph | | | +------------------------------+ +------------------------------+ | paragraph | | | +------------------------------+ First-line-indented paragraphs require no blank line to separate them:: +------------------------------+ | paragraph | | | +--+---------------------------+ | paragraph | +--+ | | | +------------------------------+ Code blocks indicated by '::' at the end of the preceeding paragraph:: +------------------------------+ | paragraph | | '::'$ | +--+---------------------------| | code block | +---------------------------+ List item blocks which are indented relative to the bullet or enumerator may contain multiple body elements (paragraphs, etc.):: +------+-----------------------+ | '- ' | list item | +------| | | (body elements)+ | +-----------------------+ +------+-----------------------+ | '- ' | list item | +--+---+ | | (body elements)+ | +---------------------------+ List item blocks which are not indented relative to the bullet or enumerator contain a single paragraph only:: +------+-----------------------+ | '- ' | paragraph | |------+ | | | +------------------------------+ Block quotes are indented relative to the preceeding text:: +------------------------------+ | current level of | | indentation | +--+---------------------------+ | block quote | | (body elements)+ | +---------------------------+ Comments begin in column 1 with two dots and a space:: +--------+---------------------+ | ^'.. ' | comment block | +--+-----+ | | | +---------------------------+ Directives are comments which begin with a directive name and two colons:: +------------------+-----------+ | ^'.. ' name '::' | directive | +--+---------------+ block | | | +---------------------------+ Footnotes use comment syntax with an underscore footnote name in brackets:: +-------------------+----------+ | ^'.. _[' name ']' | footnote | +--+----------------+ | | (body elements)+ | +---------------------------+ Hyperlink targets use comment syntax with an underscore, link name, and a colon:: +------------------+-----------+ | ^'.. _' name ':' | link | +--+---------------+ target | | | +---------------------------+ (Internal hyperlinks have empty link blocks. Indirect hyperlinks have an absolute or relative URL in their link blocks.) _______________________________________________ Doc-SIG maillist - Doc-SIG@python.org http://www.python.org/mailman/listinfo/doc-sig