Parser Ideas

This area is essentially a scratchpad for ideas related to the DCGen parser.

Handling ignore and delimlist

One question is "How do we want to syntactically specify ignore and delimlist?" We have several potential options:

  1. We can leave the terms undecorated. The problem with this is passing parameters and returning values from parse rules, which are also undecorated. This means that these terms will become "reserved words" in a sense.
  2. We can decorate them with @, a la @ignore and @delimlist. However, this clearly clashes with the token rule decorations, which decreases readability.
  3. We can decorate them with $, but I plan to use $ to allow specification of semantic actions, which again causes a conflict.
  4. Thus, I think decorating them with % is going to be the best option: %ignore and %delimlist

It is also worth noting that we already support the operations not and first when dealing with token rules. I believe we should rewrite these to be prefaced with % as well. However, the problem is the types of values these can each apply to.

  • delimlist can only apply to a parseTerm and a @lit, and only appear in parse rules.
  • not, first, and ignore can only apply to token rules and literals, and can appear in either parse or token rules.
    • I'm not sure whether to restrict them to only applying to literals and not token rules. For instance, ignoring something keeps it out of the parse tree entirely. However, token rules represent dynamic information - it is information that we cannot know going from the parse tree back to the text. Thus, ignoring stuff only makes sense for static information, like literals. I think similar arguments can be made for not and first.

So in the end, I think we can treat not, first, and ignore similarly, since we never want to be able to apply those to parse rules (i.e., trying to ignore a parse rule makes no sense, while ignoring a character or a token makes more sense). So the question is how to decorate them, and whether to decorate them differently from delimlist due to the difference in parameter types.

I'm strongly considering decorating the literal-acting functions with @ due to their use with the token matching, and reserve % for delimlist and any other parser-specific functions.