ANTLR解决非LL(*)问题和语法谓词

|| 考虑解析器中的以下规则:
expression 
    :   IDENTIFIER
    |   (...)
    |   procedure_call // e.g. (foo 1 2 3)
    |   macro_use   // e.g. (xyz (some datum))
    ;

procedure_call
    :   \'(\' expression expression* \')\'
    ;

macro_use
    :   \'(\' IDENTIFIER datum* \')\'
    ;
// Note that any string that parses as an <expression> will also parse as a <datum>.
datum 
    : simple_datum 
    | compound_datum
    ;

simple_datum 
    :   BOOLEAN
    |   NUMBER
    |   CHARACTER
    |   STRING
    |   IDENTIFIER
    ;

compound_datum 
    : list 
    | vector
    ;

list 
    :   \'(\' (datum+ ( \'.\' datum)?)? \')\'
    |   ABBREV_PREFIX datum
    ;

fragment ABBREV_PREFIX
    :   (\'\\\'\'   | \'`\' | \',\' | \',@\')
    ;

vector 
    : \'#(\' datum* \')\'
    ; 
表达式规则中的procedure_call和macro_rule替代方案会生成非LL(*)结构错误。我可以看到问题了,因为
(IDENTIFIER)
会同时解析为两者。但是即使我用+而不是*都定义了,即使上面的示例不应该再解析,它也会产生错误。 我想出了句法谓词的用法,但是我无法弄清楚如何使用它们来完成技巧。 就像是
expression 
    :   IDENTIFIER 
    |   (...)
    |   (procedure_call)=>procedure_call // e.g. (foo 1 2 3)
    |   macro_use   // e.g. (xyz (some datum))
    ; 
要么
expression 
    :   IDENTIFIER
    |   (...)
    |   (\'(\' IDENTIFIER expression)=>procedure_call // e.g. (foo 1 2 3)
    |   macro_use   // e.g. (xyz (some datum))
    ;
也不起作用,因为除了第一个规则外,什么都不会匹​​配。有解决这个问题的适当方法吗?     
已邀请:
        我发现了R5RS的JavaCC语法,我曾经(很快!)写过ANTLR等效文件:
/*
 * Copyright (C) 2011 by Bart Kiers, based on the work done by Håkan L. Younes\'
 * JavaCC R5RS grammar, available at: http://mindprod.com/javacc/R5RS.jj
 * 
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the \"Software\"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 * 
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 * 
 * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
 */
grammar R5RS;

parse
  :  commandOrDefinition* EOF
  ;

commandOrDefinition
  :  (syntaxDefinition)=>              syntaxDefinition
  |  (definition)=>                    definition
  |  (\'(\' BEGIN commandOrDefinition)=> \'(\' BEGIN commandOrDefinition+ \')\'
  |                                    command
  ;

syntaxDefinition
  :  \'(\' DEFINE_SYNTAX keyword transformerSpec \')\'
  ;

definition
  :  \'(\' DEFINE ( variable expression \')\'
                | \'(\' variable defFormals \')\' body \')\'
                )
  |  \'(\' BEGIN definition* \')\'
  ;

defFormals
  :  variable* (\'.\' variable)?
  ;

keyword
  :  identifier
  ;

transformerSpec
  :  \'(\' SYNTAX_RULES \'(\' identifier* \')\' syntaxRule* \')\'
  ;

syntaxRule
  :  \'(\' pattern template \')\'
  ;

pattern
  :  patternIdentifier
  |  \'(\' (pattern+ (\'.\' pattern | ELLIPSIS)?)?  \')\'
  |  \'#(\' (pattern+ ELLIPSIS? )? \')\'
  |  patternDatum
  ;

patternIdentifier
  :  syntacticKeyword
  |  VARIABLE
  ;

patternDatum
  :  STRING
  |  CHARACTER
  |  bool
  |  number
  ;

template
  :  patternIdentifier
  |  \'(\' (templateElement+ (\'.\' templateElement)?)? \')\'
  |  \'#(\' templateElement* \')\'
  |  templateDatum
  ;

templateElement
  :  template ELLIPSIS?
  ;

templateDatum
  :  patternDatum
  ;

command
  :  expression
  ;

identifier
  :  syntacticKeyword
  |  variable
  ;

syntacticKeyword
  :  expressionKeyword
  |  ELSE
  |  ARROW
  |  DEFINE
  |  UNQUOTE
  |  UNQUOTE_SPLICING
  ;  

expressionKeyword
  :  QUOTE
  |  LAMBDA
  |  IF
  |  SET
  |  BEGIN
  |  COND
  |  AND
  |  OR
  |  CASE
  |  LET
  |  LETSTAR
  |  LETREC
  |  DO
  |  DELAY
  |  QUASIQUOTE
  ;  

expression
  :  (variable)=>          variable
  |  (literal)=>           literal
  |  (lambdaExpression)=>  lambdaExpression
  |  (conditional)=>       conditional
  |  (assignment)=>        assignment
  |  (derivedExpression)=> derivedExpression
  |  (procedureCall)=>     procedureCall
  |  (macroUse)=>          macroUse
  |                        macroBlock
  ;

variable
  :  VARIABLE
  |  ELLIPSIS
  ;

literal
  :  quotation
  |  selfEvaluating
  ;

quotation
  :  \'\\\'\' datum
  |  \'(\' QUOTE datum \')\'
  ;

selfEvaluating
  :  bool
  |  number
  |  CHARACTER
  |  STRING
  ;

lambdaExpression
  :  \'(\' LAMBDA formals body \')\'
  ;

formals
  :  \'(\' (variable+ (\'.\' variable)?)? \')\'
  |  variable
  ;

conditional
  :  \'(\' IF test consequent alternate? \')\'
  ;

test 
  :  expression
  ;

consequent  
  :  expression
  ;

alternate 
  :  expression
  ;

assignment
  :  \'(\' SET variable expression \')\'
  ;

derivedExpression
  :  quasiquotation
  |  \'(\' ( COND ( \'(\' ELSE sequence \')\'
                | condClause+ (\'(\' ELSE sequence \')\')? 
                )
         | CASE expression ( \'(\' ELSE sequence \')\'
                           | caseClause+ (\'(\' ELSE sequence \')\')? 
                           )
         | AND test*
         | OR test*
         | LET variable? \'(\' bindingSpec* \')\' body
         | LETSTAR \'(\' bindingSpec* \')\' body
         | LETREC \'(\' bindingSpec* \')\' body
         | BEGIN sequence
         | DO \'(\' iterationSpec* \')\' \'(\' test doResult? \')\' command*
         | DELAY expression
         ) 
     \')\'
  ;

condClause
  :  \'(\' test (sequence | ARROW recipient)? \')\'
  ;

recipient
  :  expression
  ;

caseClause
  :  \'(\' \'(\' datum* \')\' sequence \')\'
  ;

bindingSpec
  :  \'(\' variable expression \')\'
  ;

iterationSpec
  :  \'(\' variable init step? \')\'
  ;

init
  :  expression
  ;

step
  :  expression
  ;

doResult
  :  sequence
  ;

procedureCall
  :  \'(\' operator operand* \')\'
  ;

operator
  :  expression
  ;

operand
  :  expression
  ;

macroUse
  :  \'(\' keyword datum* \')\'
  ;

macroBlock
  :  \'(\' (LET_SYNTAX | LETREC_SYNTAX) \'(\' syntaxSpec* \')\' body \')\'
  ;

syntaxSpec
  :  \'(\' keyword transformerSpec \')\'
  ;

body
  :  ((definition)=> definition)* sequence
  ;

//sequence
//  :  ((command)=> command)* expression
//  ;

sequence
  :  expression+
  ;

datum
  :  simpleDatum
  |  compoundDatum
  ;

simpleDatum
  :  bool
  |  number
  |  CHARACTER
  |  STRING
  |  identifier
  ;

compoundDatum
  :  list
  |  vector
  ;

list
  :  \'(\' (datum+ (\'.\' datum)?)? \')\'
  |  abbreviation
  ;

abbreviation
  :  abbrevPrefix datum
  ;

abbrevPrefix
  :  \'\\\'\' | \'`\' | \',@\' | \',\'
  ;

vector
  :  \'#(\' datum* \')\'
  ;

number
  :  NUM_2
  |  NUM_8
  |  NUM_10
  |  NUM_16
  ;

bool
  :  TRUE
  |  FALSE
  ;

quasiquotation
  :  quasiquotationD[1]
  ;

quasiquotationD[int d]
  :  \'`\' qqTemplate[d]
  |  \'(\' QUASIQUOTE qqTemplate[d] \')\'
  ;

qqTemplate[int d]
  :  (expression)=>  expression
  |  (\'(\' UNQUOTE)=> unquotation[d]
  |                  simpleDatum
  |                  vectorQQTemplate[d]
  |                  listQQTemplate[d]
  ;

vectorQQTemplate[int d]
  :  \'#(\' qqTemplateOrSplice[d]* \')\'
  ;

listQQTemplate[int d]
  :                     \'\\\'\' qqTemplate[d]
  |  (\'(\' QUASIQUOTE)=> quasiquotationD[d+1]
  |                     \'(\' (qqTemplateOrSplice[d]+ (\'.\' qqTemplate[d])?)? \')\'
  ;

unquotation[int d]
  :  \',\' qqTemplate[d-1]
  |  \'(\' UNQUOTE qqTemplate[d-1] \')\'
  ;

qqTemplateOrSplice[int d]
  :  (\'(\' UNQUOTE_SPLICING)=> splicingUnquotation[d]
  |                           qqTemplate[d]
  ;

splicingUnquotation[int d]
  :  \',@\' qqTemplate[d-1]
  |  \'(\' UNQUOTE_SPLICING qqTemplate[d-1] \')\'
  ;

// macro keywords
LET_SYNTAX       : \'let-syntax\';
LETREC_SYNTAX    : \'letrec-syntax\';
SYNTAX_RULES     : \'syntax-rules\';
DEFINE_SYNTAX    : \'define-syntax\';

// syntactic keywords
ELSE             : \'else\';
ARROW            : \'=>\';
DEFINE           : \'define\';
UNQUOTE_SPLICING : \'unquote-splicing\';
UNQUOTE          : \'unquote\';

// expression keywords
QUOTE            : \'quote\';
LAMBDA           : \'lambda\';
IF               : \'if\';
SET              : \'set!\';
BEGIN            : \'begin\';
COND             : \'cond\';
AND              : \'and\';
OR               : \'or\';
CASE             : \'case\';
LET              : \'let\';
LETSTAR          : \'let*\';
LETREC           : \'letrec\';
DO               : \'do\';
DELAY            : \'delay\';
QUASIQUOTE       : \'quasiquote\';

NUM_2  : PREFIX_2 COMPLEX_2;
NUM_8  : PREFIX_8 COMPLEX_8;
NUM_10 : PREFIX_10? COMPLEX_10;
NUM_16 : PREFIX_16 COMPLEX_16;

ELLIPSIS : \'...\';

VARIABLE 
  :  INITIAL SUBSEQUENT* 
  |  PECULIAR_IDENTIFIER
  ;

STRING : \'\"\' STRING_ELEMENT* \'\"\';

CHARACTER : \'#\\\\\' (~(\' \' | \'\\n\') | CHARACTER_NAME);

TRUE  : \'#\' (\'t\' | \'T\');
FALSE : \'#\' (\'f\' | \'F\');

// to ignore
SPACE   : (\' \' | \'\\t\' | \'\\r\' | \'\\n\') {$channel=HIDDEN;};
COMMENT : \';\' ~(\'\\r\' | \'\\n\')* {$channel=HIDDEN;};

// fragments  
fragment INITIAL : LETTER | SPECIAL_INITIAL;
fragment LETTER : \'a\'..\'z\' | \'A\'..\'Z\';
fragment SPECIAL_INITIAL : \'!\' | \'$\' | \'%\' | \'&\' | \'*\' | \'/\' | \':\' | \'<\' | \'=\' | \'>\' | \'?\' | \'^\' | \'_\' | \'~\';
fragment SUBSEQUENT : INITIAL | DIGIT | SPECIAL_SUBSEQUENT;
fragment DIGIT : \'0\'..\'9\';
fragment SPECIAL_SUBSEQUENT : \'.\' | \'+\' | \'-\' | \'@\';
fragment PECULIAR_IDENTIFIER : \'+\' | \'-\';
fragment STRING_ELEMENT : ~(\'\"\' | \'\\\\\') | \'\\\\\' (\'\"\' | \'\\\\\');
fragment CHARACTER_NAME : \'space\' | \'newline\';

fragment COMPLEX_2 
  :  REAL_2 (\'@\' REAL_2)?
  |  REAL_2? SIGN UREAL_2? (\'i\' | \'I\')
  ;

fragment COMPLEX_8 
  :  REAL_8 (\'@\' REAL_8)?
  |  REAL_8? SIGN UREAL_8? (\'i\' | \'I\')
  ;

fragment COMPLEX_10 
  :  REAL_10 (\'@\' REAL_10)?
  |  REAL_10? SIGN UREAL_10? (\'i\' | \'I\')
  ;

fragment COMPLEX_16 
  :  REAL_16 (\'@\' REAL_16)?
  |  REAL_16? SIGN UREAL_16? (\'i\' | \'I\')
  ;

fragment REAL_2 : SIGN? UREAL_2;
fragment REAL_8 : SIGN? UREAL_8;
fragment REAL_10 : SIGN? UREAL_10;
fragment REAL_16 : SIGN? UREAL_16;
fragment UREAL_2 : UINTEGER_2 (\'/\' UINTEGER_2)?;
fragment UREAL_8 : UINTEGER_8 (\'/\' UINTEGER_8)?;
fragment UREAL_10 : UINTEGER_10 (\'/\' UINTEGER_10)? | DECIMAL_10;
fragment UREAL_16 : UINTEGER_16 (\'/\' UINTEGER_16)?;

fragment DECIMAL_10 
  :  UINTEGER_10 SUFFIX
  |  \'.\' DIGIT+ \'#\'* SUFFIX?
  |  DIGIT+ \'.\' DIGIT* \'#\'* SUFFIX?
  |  DIGIT+ \'#\'+ \'.\' \'#\'* SUFFIX?
  ;

fragment UINTEGER_2 : DIGIT_2+ \'#\'*;
fragment UINTEGER_8 : DIGIT_8+ \'#\'*;
fragment UINTEGER_10 : DIGIT+ \'#\'*;
fragment UINTEGER_16 : DIGIT_16+ \'#\'*;
fragment PREFIX_2 : RADIX_2 EXACTNESS? | EXACTNESS RADIX_2;
fragment PREFIX_8 : RADIX_8 EXACTNESS? | EXACTNESS RADIX_8;
fragment PREFIX_10 : RADIX_10 EXACTNESS? | EXACTNESS RADIX_10;
fragment PREFIX_16 : RADIX_16 EXACTNESS? | EXACTNESS RADIX_16;
fragment SUFFIX : EXPONENT_MARKER SIGN? DIGIT+;
fragment EXPONENT_MARKER : \'e\' | \'s\' | \'f\' | \'d\' | \'l\' | \'E\' | \'S\' | \'F\' | \'D\' | \'L\';
fragment SIGN : \'+\' | \'-\';
fragment EXACTNESS : \'#\' (\'i\' | \'e\' | \'I\' | \'E\');
fragment RADIX_2 : \'#\' (\'b\' | \'B\');
fragment RADIX_8 : \'#\' (\'o\' | \'O\');
fragment RADIX_10 : \'#\' (\'d\' | \'D\');
fragment RADIX_16 : \'#\' (\'x\' | \'X\');
fragment DIGIT_2 : \'0\' | \'1\';
fragment DIGIT_8 : \'0\'..\'7\';
fragment DIGIT_16 : DIGIT | \'a\'..\'f\' | \'A\'..\'F\';
可以使用以下类进行测试:
import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String source = 
        \"(define sum-iter                        \\n\" + 
        \"  (lambda(n acc i)                      \\n\" + 
        \"    (if (> i n)                         \\n\" + 
        \"      acc                               \\n\" + 
        \"      (sum-iter n (+ acc i) (+ i 1)))))   \";
    R5RSLexer lexer = new R5RSLexer(new ANTLRStringStream(source));
    R5RSParser parser = new R5RSParser(new CommonTokenStream(lexer));
    parser.parse();
  }
}
并生成一个词法分析器和解析器,编译所有Java源文件并运行主类,请执行以下操作: bart @ hades:〜/编程/ ANTLR / Demos / R5RS $ java -cp antlr-3.3.jar org.antlr.Tool R5RS.g bart @ hades:〜/编程/ ANTLR / Demos / R5RS $ javac -cp antlr-3.3.jar * .java bart @ hades:〜/编程/ ANTLR / Demos / R5RS $ java -cp。:antlr-3.3.jar主要 bart @ hades:〜/编程/ ANTLR / Demos / R5RS $ 控制台上没有任何输出的事实意味着解析器(和词法分析器)在提供的源中未发现任何错误。 请注意,我没有单元测试,只测试了
Main
类中的单个Scheme资源。如果您在ANTLR语法中发现错误,不胜感激,希望我能解决该语法。在适当的时候,我可能会将语法提交给官方的ANTLR Wiki。     

要回复问题请先登录注册