Plain Text Output Ideas
This page will serve as a scratchpad for ideas regarding straight-text output.
In some cases, we will want to output straight text, for a variety of possible reasons:
- The output format is not specifiable [easily] in a CFG
- We only need a small subset of the output format, smaller than we can write a CFG for
- We don't care about translating back from the output format, only translating to
- Rapid prototyping - we can develop the grammar later, but we might only care about outputting for now, rather than having a full translator.
Format for specifying output
We can use a new arrow notation to specify we want to output strings rather than translate to a new tree format:
tree(a, b, c, params) ~~>
~| foo ~a;
# Match against a regular expression
var ~~= ~b(c) + 5.0;
~c
fn(~delimlist(params, ','))
~.
treee(a) ~~> ohboy(~a) ~.
We notice several things here.
- We specify the arrow as ~~>, to indicate we are outputting straight text.
- The output text MUST begin on a new line after the ~~>; if the Programmer puts the text on the same line, it can only be a single line (e.g., as a very simple rule).
- We insert tree values with ~var. This is similar to the format/2 predicate from Prolog.
- When inserting a tree value, if it is a branch, then its rule is expanded fully, and the expansion is substituted for the variable.
- All possible branches must have expansions. This is checked at compile time, based on the prior stages of translation. This is to prevent errors because no expansion exists for a particular rule.
- Tildes are input as ~~.
- We stop the text at the ~. symbol, since we can't just stop at . (as it may occur naturally in the text). Obviously, if the Programmer wants to put the symbols ~. in the output text, they should write ~~..
- The output text begins with the first non-space character typed.
- If the text is on the same line as the branch, then it is output flush against the (current) left margin.
- If on its own line, the indentation to the first non-space character is considered the left margin. If desired, the Programmer can use ~| to indicate the start of the left margin; all further space characters are interpreted as indentation. The margin starts after the |, as shown above. Placing text before the | is a semantic error (see below).
- If any future lines have less indentation than the first, then a semantic error is raised (along with a suggestion of using ~| ).
- Indentation is tracked and used; rules can always act as if they are flush against the margin, but they may in the output be indented some amount due to the indentation caused by other rules. For instance, when expanding ~c, we may assume that we are flush against the left margin, but this "left" margin is actually 4 spaces to the right of where it should be.
- As a result of these rules, it is not possible for any expanded rule to decrease the current amount of indentation.
- Text is output verbatim, so there is no commenting possible in the midst of the expansion.
- If we have a list of objects (such as param+ or %delimlist(param, ',') in the parser), we can output them in a delimiter-separated list using ~delimlist(<rule>, <token>).
