From: doc-sig-admin@python.org on behalf of David Goodger [dgoodger@bigfoot.com] Sent: 25 November 2000 04:08 To: doc-sig@python.org Subject: [Doc-SIG] Problems With StructuredText ============================== Problems With StructuredText ============================== David Goodger (mailto:dgoodger@bigfoot.com) 2000-11-24 StructuredText_ is a great idea, however it does have flaws. Many of these flaws go back to the original Setext_ (structure enhanced text) specification (interesting reading!). .. _StructuredText: http://dev.zope.org/Members/jim/StructuredTextWiki/FrontPage .. _Setext: http://www.bsdi.com/setext There are several problems, unresolved issues, and areas of controversy within StructuredText. In order to resolve all these issues, I'd like to bring them all out into the open, enumerate all the alternatives, and propose solutions. Problems below are labelled C for Classic StructuredText, NG for Next Generation. 1. No formal specification_. The code *is* the standard. (C, NG) 2. Difficult to [understand and extend]_. (C, NG) 3. Block/section structure via indentation_. (C, NG) 4. No [escaping mechanism]_. (C, NG) 5. Awkward [bullet list markup]_: 'o'. (C, NG) 6. Problematic [enumerated list markup]_. (C) 7. Ambiguous markup for [code blocks]_. (C, NG) 8. Tables_. (C, NG) 9. Awkward [inline code]_ markup. (C, NG) 10. Awkward [hyperlink markup}_. (C, NG) 11. Markup must start with whitespace_. (C, NG) 1. Formal Specification ======================= .. _specification: The description in the original StructuredText.py has been criticized for being vague. "The code *is* the standard." Tony "Tibs" Ibbs has been working on deducing a detailed description from the documentation and code of StructuredTextNG_. His notes are available at: http://www.tibsnjoan.demon.co.uk/STNG-format.html .. _StructuredTextNG: http://dev.zope.org/Members/jim/StructuredTextWiki/StructuredTextNG The specification should always preceed the code. Otherwise, StructuredText is a moving target which can never be adopted as a standard. 2. Understanding and Extending the Code ======================================= .. _understand and extend: The original StructuredText is a dense mass of sparsely commented code and inscrutable regular expressions. It was not designed to be extended and is very difficult to understand. StructuredTextNG has been designed to allow input (syntax) and output extensions, but its documentation (both internal [comments & docstrings], and external) is inadequate. I would like to see Structured Text become truly useful, perhaps even joining Python's standard library. Therefore it must have clear, understandable documentation and implementation code. 3. Structure via Indentation ============================ .. _indentation: Setext_ required that body text be indented by 2 spaces. The original StructuredText_ and StructuredTextNG_ require that section structure be indicated through indentation, as 'inspired by Python'. For certain structures (lists, code blocks, block quotes) indentation naturally indicates structure/hierarchy. For section structure, indentation is unnatural, wasteful of horizontal space, and awkward. Rather, the style of the section title usually indicates its structure. In the original StructuredText, sections consist of title paragraphs followed by indented paragraphs and other body elements. Using indentation is: - Unnatural -- Most published works use title style (type size, face, weight, and position) rather than indentation to indicate hierarchy. When indentation is used, it is usually the formatted end-result and is there for aesthetic rather than structural purposes. - Wasteful -- As the left indent is increased, the amound of horizontal space available for text decreases, unnecessarily extending the vertical length of a document. - Awkward -- One must think about the formatting as the text is keyed in. And when structural changes are made (it is very common during the composition of a document to rearrange sections and their hierarchy) we must use block-indent and -unindent functions. In order to input documents using indentation, relatively advanced text editors must be used. Python's significant whitespace is a wonderful innovation (even if it wasn't original to Python), however applying indentation to ordinary written text is overgeneralization. Instead, section structure through title style (as exemplified by this document) is far more natural. In fact, it is already in widespread use in plain text documents. 4. Escaping Mechanism ===================== .. _escaping mechanism: StructuredText needs a mechanism to treat markup-significant characters as the characters themselves. Currently there is no such mechanism (although ZWiki uses '!'). What are the candidates? 1. ! (http://dev.zope.org/Members/jim/StructuredTextWiki/NGEscaping) 2. \ 3. ~ 4. any others? I believe the best choice for this is the backslash (\). It's the single most popular escaping character in the world, therefore familiar. Since characters only need to be escaped under special circumstances, which are typically those explaining technical programming issues, the use of the backslash is natural and understandable. Python docstrings can be raw (prefixed with an 'r', as in 'r""'), which would obviate the need for gratuitous doubling-up of backslashes. The rule would be: A backslash followed by any character escapes the character. The escaped character represents the character itself, and is prevented from playing a role in any markup interpretation. The backslash is removed from the output. A literal backslash is represented by two backslashes in a row. XXX Allow backslashes preceeding non-markup characters to remain in the output? That might make describing regexes much easier. 5. Bullet List Markup ===================== .. _bullet list markup: StructuredText includes 'o' as a bullet character. This is dangerous and counter to the language-independent nature of the markup. There are many languages in which 'o' is a word. For example, in Spanish: Llamame a la casa o al trabajo. (Call me at home or at work.) And in Japanese (when romanized): Senshuu no doyoubi ni tegami o kakimashita. ([I] wrote a letter on Saturday last week.) If a paragraph containing an 'o' word wraps such that the 'o' is the first text on a line, it could be misinterpreted as a bullet list. I recommend omitting 'o' as a bullet character. '+' could be used instead. 6. Enumerated List Markup ========================= .. _enumerated list markup: StructuredText enumerated lists are allowed to begin with a number (sequence of digits) followed by whitespace. This could have consequences for line wrapping and writing styles:: "That bird wouldn't *voom* if you put 10000 volts through it!" 1 is all I need. I recommend requiring something after the number, a period ('.'), a colon (':'), a dash ('-'), a space and a dash (' -'), a right-parenthesis (')'), or surrounded with parentheses ('()'). Perhaps this list is excessive. But forgiving is better than restrictive. Should the digits/letters/numerals themselves be interpreted, allowing nested enumerated lists to be created without indentation? How about nested enumerated lists without indentation via compound enumerators? Simply count the 'length' (number of sub-enumerators) of the compound enumerator:: 1. one 1.a. two 1.a.I. three 2.a. two 2.b.I. three 7. Code Blocks ============== .. _code blocks: The StructuredText specification has example code blocks indicated by 'example', 'examples', or '::' ending the preceeding paragraph. STNG only recognizes '::'; 'example'/'examples' are not implemented. This is good; it fixes a language-dependent feature. The problem is what to do with the '::'. I propose that '::' at the end of a paragraph indicate that subsequent *indented* blocks are treated as example code. No further markup interpretation is done within code blocks (not even backslash-escapes). If the '::' is preceeded by whitespace, '::' is omitted from the output; if '::' was the sole content of a paragraph, the entire paragraph is removed (no 'empty' paragraph remains). If '::' is preceeded by a non-whitespace character, '::' is replaced by ':' (i.e., the extra colon is removed). Thus, a section could begin with a code block as follows:: Section Title ------------- :: print "this is example code" One possible variation is for meta-documentation (perhaps an extension?): use triple-colons (':::') to indicate 'take the following code block, mark it up as a code block, then copy it and mark it up as if it weren't a code block'. The implementation may insert text in-between, such as 'Marked up as:', or may alter the formatting (different font, set in a colored box, whatever). 8. Tables ========= .. _tables: The table markup scheme in classic StructuredText was horrible. Its omission from StructuredTextNG is welcome, and I will not dignify the markup by repeating it here. However, tables themselves are useful in documentation. Alternatives: 1. This format is the most natural and obvious. I came up with it (no great feat of creation!), and later discovered that it is the format supported by the [Emacs table mode]_:: +------------+------------+---------------------------+ | Column 1 | Column 2 | Column 3 & 4 span (Row 1) | +------------+------------+------------+--------------+ | Column 1 & 2 span | Column 3 | - Column 4 | +------------+------------+------------+ - Row 2 & 3 | | 1 | 2 | 3 | - span | +------------+------------+------------+--------------+ Tables are described with a visual outline made up of the characters '-', '|', and '+'. The hyphen ('-') is used for horizontal lines (row separators), the vertical bar ('|') for vertical lines (column separators), and the plus sign ('+') for intersections of horizontal and vertical lines. Row and column spans are possible simply by omitting the column or row separators, respectively. Each cell contains body elements, and may have multiple paragraphs, lists, etc. .. _Emacs table mode: ftp://archive.cis.ohio-state.edu/pub/emacs-lisp/archive/table.el 2. Below is a minimalist possibility. It may be better suited to manual input than alternative #1, but there is no Emacs editing mode available. One disadvantage is that it resembles section titles; a one-column table would look exactly like section titles. It could be a directive-driven (extra syntax) extension. :: Column 1 Column 2 Column 3 & 4 span (Row 1) ============ ============ =========================== Column 1 & 2 span Column 3 - Column 4 ------------------------- ------------ - Row 2 & 3 1 2 3 - span ============ ============ ============ ============== Each row is underlined. The head row is underlined with '=', with spaces at column boundaries. If there is no head row, the table begins with a top border of equals signs with spaces at column boundaries. Internal row separators are underlines of '-', with spaces at column boundaries. Column spans have no spaces. Row spans simply lack an underline at the row boundary. The bottom boundary of the table consists of '=' underlines. A blank line is required following a table. 9. Inline Code ============== .. _inline code: The current markup for inline code (text left as-is, verbatim, usually in a monospaced font; HTML ) is single quotes ('code'). The problem with single quotes is that they are too often used for other purposes, like apostrophes, quoting text, and string literals. Alternatives:: 'code' \'code\' ''code'' "code" \"code\" ""code"" #code# @code@ `code` ^code^ ``code'' The examples below contain inline code, quoted text, and apostrophes. Each example should evaluate to the following HTML::
Some code, with a 'quote', "double", ain't it grand?
0. Some code, with a quote, double, ain't it grand? 1. Some \'code\', with a 'quote', "double", ain't it grand? 2. Some 'code', with a \'quote\', "double", ain\'t it grand? 3. Some ''code'', with a 'quote', "double", ain't it grand? 4. Some "code", with a 'quote', \"double\", ain't it grand? 5. Some \"code\", with a 'quote', "double", ain't it grand? 6. Some ""code"", with a 'quote', "double", ain't it grand? 7. Some #code#, with a 'quote', "double", ain't it grand? 8. Some @code@, with a 'quote', "double", ain't it grand? 9. Some `code`, with a 'quote', "double", ain't it grand? 10. Some ^code^, with a 'quote', "double", ain't it grand? 11. Some ``code'', with a 'quote', "double", ain't it grand? A more complicated piece of inline code::Does a[b] = 'c' + "d" + `2^3` work?
0. Does a[b] = 'c' + "d" + `2^3` work? 1. Does \'a[b] = 'c' + "d" + `2^3`\' work? 2. Does 'a[b] = \'c\' + "d" + `2^3`' work? 3. Does ''a[b] = 'c' + "d" + `2^3`'' work? 4. Does "a[b] = 'c' + "d" + `2^3`" work? 5. Does \"a[b] = 'c' + "d" + `2^3`\" work? 6. Does ""a[b] = 'c' + "d" + `2^3`"" work? 7. Does #a[b] = 'c' + "d" + `2^3`# work? 8. Does @a[b] = 'c' + "d" + `2^3`@ work? 9. Does `a[b] = 'c' + "d" + \`2^3\`` work? 10. Does ^a[b] = 'c' + "d" + `2\^3`^ work? 11. Does ``a[b] = 'c' + "d" + `2^3`'' work? Backquotes (#9) seem to be the best choice. They are unobtrusive and relatviely rarely used (more rarely than ' or ", anyhow). Backquotes have the connotation of 'quotes', which other options (like carets, #10) don't. When used within code, they can be escaped (\`). Alternative choices are carets (#10) and TeX-style quotes (#11). For examples of TeX-style quoting, see: http://www.zope.org/Members/jim/StructuredTextWiki/CustomizingTheDocumentPro cessor The only uses of backquotes I know are: (A) As a synonym for repr() in Python. (B) For command-interpolation in shell scripts. (C) Used as open-quotes in TeX code (and carried over into plaintext by TeXies). The backslash-escape mechanism would allow A & B inside inline code. TeX quotes outside inline code (``like this'') could be a special case, interpreted and marked up as proper quotes. That leaves TeX quotes inside inline code, which (although ugly) could be handled by escaping with backslashes:: line `\`\`this''`! Let's face it, no mechanism for inline code is perfect, just as no escaping mechanism is perfect. No matter what we use, complicated expressions will end up looking ugly. We can only choose the least ugly option. 10. HyperLinks ============== .. _hyperlink markup: There are three forms of hyperlink currently in StructuredText_: 1. (Absolute & relative URLs.) Text enclosed by double quotes followed by a colon, a URL, and concluded by punctuation plus white space, or just white space, is treated as a hyperlink:: "Python":http://www.python.org/ 2. (Absolute URLs only.) Text enclosed by double quotes followed by a comma, one or more spaces, an absolute URL and concluded by punctuation plus white space, or just white space, is treated as a hyperlink:: "mail me", mailto:me@mail.com 3. (Endnotes.) Text enclosed by brackets link to an endnote at the end of the document: at the beginning of the line, two dots, a space, and the same text in brackets, followed by the end note itself:: Please refer to the fine manual [GVR2000]. .. [GVR2000] Python Documentation, van Rossum, Drake, et al., http://www.python.org/doc/ The problem with forms 1 and 2 is that they are neither intuitive nor unobtrusive (they break Goal 2). The brackets in form 3 are too common in ordinary text (such as [nested] asides and Python lists like [12]). Alternatives: 0. Have no special markup for hyperlinks. A. Except for #1 below? 1. Interpret and mark up hyperlinks as any contiguous text containing '://' or ':...@' after an alphanumeric word (absolute URL; exact specification to be looked up). To de-emphasize the URL, simply enclose it in parentheses: Python (http://www.python.org/) A. Leave special hyperlink markup as a domain-specific extension. Ordinary Structured Text documents would be required to have inline hyperlinks. Processed hyperlinks (with the URL hidden) may be important for Zope and ZWiki pages, but are they important for general uses? I suspect yes. 2. The original Setext_ introduced a mechanism of indirect hyperlinks. A source link word ('hot word') in the text is given a trailing underscore:: Here is some text with a hyperlink_ built in. The hyperlink itself appears at the end of the document on a line by itself, beginning with two dots, a space, the link word with a leading underscore, whitespace, and the URL itself:: .. _hyperlink http://www.123.xyz This has the advantage of being readable and relatively unobtrusive. Since each source link must match up to a target, the odd variable ending in an underscore can be spared being marked up (no such target). The only disadvantage is that phrase-links aren't possible without some obtrusive syntax. Setext used 'underscores_instead_of_spaces_' for phrase links. We could achieve phrase-links if we enclose the link text in double quotes ('"like this"_') or in brackets ('[like this]_'). We get obtrusive markup, but that is unavoidable. I prefer the bracketed syntax as reminiscent of links on many web pages. The same markup can also be used for footnotes, removing the problem with ordinary bracketed text and Python lists:: Please refer to the fine manual [GVR2000]_. .. _[GVR2000] Python Documentation, van Rossum, Drake, et al., http://www.python.org/doc/ The two-dots-and-a-space syntax was generalized by Setext for comments, which are removed from the processed text. In order to eliminate ambiguity with comments and footnotes, I propose that a colon always follow the target link word/phrase in indirect hyperlinks (denoting 'maps to'). There is no reason to restrict target links to the end of the document; they could just as easily be interspersed. Internal hyperlinks (hyperlinks from one point to another within a single document) can be expressed by a source link as before, and a target link with a colon but no URL. As an added bonus, we now have a perfect candidate for Structured Text directives, a simple extension mechanism: a comment containing a single word followed by two colons and whitespace. The interpretation of subsequent data on the directive line or following is directive- and/or implementation-dependent. To summarize:: .. This is a comment. .. version:: 1 .. The line above is an example of a directive. This internal hyperlink will take us to the footnotes_. Here is a one-word_ indirect hyperlink. Here is [an indirect hyperlink phrase]_. This is a footnote [1]_. .. _footnotes: .. _one-word: http://www.123.xyz .. _an indirect hyperlink phrase: http://www.123.xyz .. _[1] Footnote text goes here. The presence or absence of a colon after the target link differentiates an indirect hyperlink from a footnote, respectively. Brackets around a target link word or phrase are optional as long as the phrase does not contain a colon. The examples below contain links (URLs & references), and bracketed text. In HTML, each example should evaluate to::A URL, see [eggs2000] (in Bacon [Publisher]). Also see http://eggs.org.
0. A URL http://spam.org, see eggs2000 (in Bacon [Publisher]). Also see http://eggs.org. 1. A "URL":http://spam.org, see [eggs2000] (in Bacon [Publisher]). Also see "http://eggs.org":http://eggs.org. 2. A "URL", http://spam.org, see [eggs2000] (in Bacon [Publisher]). Also see "http://eggs.org", http://eggs.org. 3. A URL_, see [eggs2000]_ (in Bacon [Publisher]). Also see http://eggs.org. The bracketed text '[Publisher]' may be problematic with syntax 1 & 2. Syntax 3 is definitely the most readable. Here is the endnote/footnote itself. In HTML, each example should evaluate to::[eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam"
0. eggs2000 "Spam, Spam, Spam, Eggs, Bacon, and Spam" 1. .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam" 2. .. [eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam" 3. .. _[eggs2000] "Spam, Spam, Spam, Eggs, Bacon, and Spam" For style #3, the indirect hyperlink would be entered as follows:: .. _URL: http:/spam.org 11. Whitespace Delimitation of Markup ===================================== .. _whitespace: StructuredText specifies that inline markup begin with whitespace, precluding such constructs as parenthesized or quoted emphatic text:: "**What?**" she cried. (*exit stage left*) The specification for how markup is detected should be refined to allow for such constructs. _______________________________________________ Doc-SIG maillist - Doc-SIG@python.org http://www.python.org/mailman/listinfo/doc-sig