reStructuredText的ANTLR语法(规则优先级)
||
第一个问题流
大家好,
这可能是该问题的后续措施:Antlr规则优先级
我正在尝试为reStructuredText标记语言编写ANTLR语法。
我面临的主要问题是:“如何在不掩盖其他语法规则的情况下匹配任何字符序列(常规文本)?”
让我们以带有内联标记的段落为例:
In `Figure 17-6`_, we have positioned ``before_ptr`` so that it points to the element
*before* the insert point. The variable ``after_ptr`` points to the element *after* the
insert. In other words, we are going to put our new element **in between** ``before_ptr``
and ``after_ptr``.
我认为为内联标记文本编写规则会很容易。所以我写了一个简单的语法:
grammar Rst;
options {
output=AST;
language=Java;
backtrack=true;
//memoize=true;
}
@members {
boolean inInlineMarkup = false;
}
// PARSER
text
: inline_markup (WS? inline_markup)* WS? EOF
;
inline_markup
@after {
inInlineMarkup = false;
}
: {!inInlineMarkup}? (emphasis|strong|litteral|link)
;
emphasis
@init {
inInlineMarkup = true;
}
: \'*\' (~\'*\')+ \'*\' {System.out.println(\"emphasis: \" + $text);}
;
strong
@init {
inInlineMarkup = true;
}
: \'**\' (~\'*\')+ \'**\' {System.out.println(\"bold: \" + $text);}
;
litteral
@init {
inInlineMarkup = true;
}
: \'``\' (~\'`\')+ \'``\' {System.out.println(\"litteral: \" + $text);}
;
link
@init {
inInlineMarkup = true;
}
: inline_internal_target
| footnote_reference
| hyperlink_reference
;
inline_internal_target
: \'_`\' (~\'`\')+ \'`\' {System.out.println(\"inline_internal_target: \" + $text);}
;
footnote_reference
: \'[\' (~\']\')+ \']_\' {System.out.println(\"footnote_reference: \" + $text);}
;
hyperlink_reference
: ~(\' \'|\'\\t\'|\'\\u000C\'|\'_\')+ \'_\' {System.out.println(\"hyperlink_reference: \" + $text);}
| \'`\' (~\'`\')+ \'`_\' {System.out.println(\"hyperlink_reference (long): \" + $text);}
;
// LEXER
WS
: (\' \'|\'\\t\'|\'\\u000C\')+
;
NEWLINE
: \'\\r\'? \'\\n\'
;
这个简单的语法不起作用。而且我什至都没有尝试匹配常规文本...
我的问题:
有人可以指出我的错误,也许可以给我一些关于如何匹配常规文本的提示?
有没有办法设置语法规则的优先级?也许这可能是一个线索。
在此先感谢您的帮助 :-)
罗宾
第二个问题流
非常感谢您的帮助!我很难弄清楚我的错误...我不是在写语法(只是为了学习ANTLR),而是在尝试为IDE Eclipse编写代码。为此,我需要一种语法;)
我在语法上走得更远,并写了一个text
规则:
grammar Rst;
options {
output=AST;
language=Java;
}
@members {
boolean inInlineMarkup = false;
}
//////////////////
// PARSER RULES //
//////////////////
file
: line* EOF
;
line
: text* NEWLINE
;
text
: inline_markup
| normal_text
;
inline_markup
@after {
inInlineMarkup = false;
}
: {!inInlineMarkup}? {inInlineMarkup = true;}
(
| STRONG
| EMPHASIS
| LITTERAL
| INTERPRETED_TEXT
| SUBSTITUTION_REFERENCE
| link
)
;
link
: INLINE_INTERNAL_TARGET
| FOOTNOTE_REFERENCE
| HYPERLINK_REFERENCE
;
normal_text
: {!inInlineMarkup}?
~(EMPHASIS
|SUBSTITUTION_REFERENCE
|STRONG
|LITTERAL
|INTERPRETED_TEXT
|INLINE_INTERNAL_TARGET
|FOOTNOTE_REFERENCE
|HYPERLINK_REFERENCE
|NEWLINE
)
;
//////////////////
// LEXER TOKENS //
//////////////////
EMPHASIS
: STAR ANY_BUT_STAR+ STAR {System.out.println(\"EMPHASIS: \" + $text);}
;
SUBSTITUTION_REFERENCE
: PIPE ANY_BUT_PIPE+ PIPE {System.out.println(\"SUBST_REF: \" + $text);}
;
STRONG
: STAR STAR ANY_BUT_STAR+ STAR STAR {System.out.println(\"STRONG: \" + $text);}
;
LITTERAL
: BACKTICK BACKTICK ANY_BUT_BACKTICK+ BACKTICK BACKTICK {System.out.println(\"LITTERAL: \" + $text);}
;
INTERPRETED_TEXT
: BACKTICK ANY_BUT_BACKTICK+ BACKTICK {System.out.println(\"LITTERAL: \" + $text);}
;
INLINE_INTERNAL_TARGET
: UNDERSCORE BACKTICK ANY_BUT_BACKTICK+ BACKTICK {System.out.println(\"INLINE_INTERNAL_TARGET: \" + $text);}
;
FOOTNOTE_REFERENCE
: L_BRACKET ANY_BUT_BRACKET+ R_BRACKET UNDERSCORE {System.out.println(\"FOOTNOTE_REFERENCE: \" + $text);}
;
HYPERLINK_REFERENCE
: BACKTICK ANY_BUT_BACKTICK+ BACKTICK UNDERSCORE {System.out.println(\"HYPERLINK_REFERENCE (long): \" + $text);}
| ANY_BUT_ENDLINK+ UNDERSCORE {System.out.println(\"HYPERLINK_REFERENCE (short): \" + $text);}
;
WS
: (\' \'|\'\\t\')+ {$channel=HIDDEN;}
;
NEWLINE
: \'\\r\'? \'\\n\' {$channel=HIDDEN;}
;
///////////////
// FRAGMENTS //
///////////////
fragment ANY_BUT_PIPE
: ESC PIPE
| ~(PIPE|\'\\n\'|\'\\r\')
;
fragment ANY_BUT_BRACKET
: ESC R_BRACKET
| ~(R_BRACKET|\'\\n\'|\'\\r\')
;
fragment ANY_BUT_STAR
: ESC STAR
| ~(STAR|\'\\n\'|\'\\r\')
;
fragment ANY_BUT_BACKTICK
: ESC BACKTICK
| ~(BACKTICK|\'\\n\'|\'\\r\')
;
fragment ANY_BUT_ENDLINK
: ~(UNDERSCORE|\' \'|\'\\t\'|\'\\n\'|\'\\r\')
;
fragment ESC
: \'\\\\\'
;
fragment STAR
: \'*\'
;
fragment BACKTICK
: \'`\'
;
fragment PIPE
: \'|\'
;
fragment L_BRACKET
: \'[\'
;
fragment R_BRACKET
: \']\'
;
fragment UNDERSCORE
: \'_\'
;
语法对于inline_markup正常工作,但normal_text不匹配。
这是我的测试课:
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.RecognitionException;
import org.antlr.runtime.tree.Tree;
public class Test {
public static void main(String[] args) throws RecognitionException, IOException {
InputStream is = Test.class.getResourceAsStream(\"test.rst\");
Reader r = new InputStreamReader(is);
StringBuilder source = new StringBuilder();
char[] buffer = new char[1024];
int readLenght = 0;
while ((readLenght = r.read(buffer)) > 0) {
if (readLenght < buffer.length) {
source.append(buffer, 0, readLenght);
} else {
source.append(buffer);
}
}
r.close();
System.out.println(source.toString());
ANTLRStringStream in = new ANTLRStringStream(source.toString());
RstLexer lexer = new RstLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
RstParser parser = new RstParser(tokens);
RstParser.file_return out = parser.file();
System.out.println(((Tree)out.getTree()).toStringTree());
}
}
和我使用的输入文件:
In `Figure 17-6`_, we have positioned ``before_ptr`` so that it points to the element
*before* the insert point. The variable ``after_ptr`` points to the |element| *after* the
insert. In other words, `we are going`_ to put_ our new element **in between** ``before_ptr``
and ``after_ptr``.
我得到以下输出:
HYPERLINK_REFERENCE (short): 7-6`_
line 1:2 mismatched character \' \' expecting \'_\'
line 1:10 mismatched character \' \' expecting \'_\'
line 1:18 mismatched character \' \' expecting \'_\'
line 1:21 mismatched character \' \' expecting \'_\'
line 1:26 mismatched character \' \' expecting \'_\'
line 1:37 mismatched character \' \' expecting \'_\'
LITTERAL: `before_ptr`
line 1:86 no viable alternative at character \'\\r\'
line 1:55 mismatched character \' \' expecting \'_\'
line 1:60 mismatched character \' \' expecting \'_\'
line 1:63 mismatched character \' \' expecting \'_\'
line 1:70 mismatched character \' \' expecting \'_\'
line 1:73 mismatched character \' \' expecting \'_\'
line 1:77 mismatched character \' \' expecting \'_\'
line 1:85 mismatched character \' \' expecting \'_\'
EMPHASIS: *before*
line 2:12 mismatched character \' \' expecting \'_\'
line 2:19 mismatched character \' \' expecting \'_\'
line 2:26 mismatched character \' \' expecting \'_\'
LITTERAL: `after_ptr`
line 2:30 mismatched character \' \' expecting \'_\'
line 2:39 mismatched character \' \' expecting \'_\'
line 2:90 no viable alternative at character \'\\r\'
line 2:60 mismatched character \' \' expecting \'_\'
line 2:63 mismatched character \' \' expecting \'_\'
line 2:67 mismatched character \' \' expecting \'_\'
line 2:77 mismatched character \' \' expecting \'_\'
line 2:85 mismatched character \' \' expecting \'_\'
line 2:89 mismatched character \' \' expecting \'_\'
line 3:7 mismatched character \' \' expecting \'_\'
line 3:10 mismatched character \' \' expecting \'_\'
line 3:16 mismatched character \' \' expecting \'_\'
line 3:23 mismatched character \' \' expecting \'_\'
line 3:27 mismatched character \' \' expecting \'_\'
line 3:31 mismatched character \' \' expecting \'_\'
line 3:42 mismatched character \' \' expecting \'_\'
line 3:51 mismatched character \' \' expecting \'_\'
line 3:55 mismatched character \' \' expecting \'_\'
line 3:63 mismatched character \' \' expecting \'_\'
line 3:94 mismatched character \'\\r\' expecting \'*\'
line 4:3 mismatched character \' \' expecting \'_\'
line 4:18 no viable alternative at character \'\\r\'
line 4:18 mismatched character \'\\r\' expecting \'_\'
HYPERLINK_REFERENCE (short): oing`_
HYPERLINK_REFERENCE (short): ut_
EMPHASIS: *in between*
LITTERAL: `after_ptr`
BR.recoverFromMismatchedToken
line 0:-1 mismatched input \'<EOF>\' expecting NEWLINE
null
您能指出我的错误吗? (当我在语法中添加filter = true;选项时,解析器适用于内联标记而不会出现错误)
罗宾
没有找到相关结果
已邀请:
2 个回复
蹄渭信妥扳
从上面生成解析器和词法分析器时,让它解析以下输入文件: *** x *** ** yyy ** * zz * * a b c P2``* a *`b''q 蟒蛇_ (请注意尾随换行符!) 解析器将产生以下AST: 编辑 可以通过运行此类创建图形:
或者如果您的来源来自文件,请执行以下操作:
要么
其中“ 11”是文件的编码。 上面的类会将AST作为DOT文件打印到控制台。您可以使用DOT查看器来显示AST。在这种情况下,我发布了由kgraphviewer创建的图像。但是周围还有更多的观众。一个不错的在线版本是该版本,它似乎在“引擎盖”下使用kgraphviewer。祝好运!
肺鬼耙扮群
)。同样,根据使用的位置(在解析器或词法分析器规则中),“ 13”(否定)元字符也具有不同的含义。 采取以下语法:
ANTLR首先将
文字“移动”到词法分析器规则,如下所示:
(不使用名称
,但这无关紧要)。现在,解析器规则
与
以外的任何字符都不匹配!解析器规则中的否定符会否定标记(或词法分析器规则)。因此
将与词法器规则
或词法器规则
匹配。 在词法分析器规则内部,
取反(单个!)字符。因此,词法分析器规则
将匹配范围
..
中的任何字符,但
除外。请注意以下内容:“ 28”在词法分析器规则内无效:您只能取反单个字符集。 因此,解析器规则中的所有这些“ѭ29”都是错误的(错误的含义是:它们的行为不符合您的预期)。 罗宾写道: 有没有办法设置语法规则的优先级?也许这可能是一个线索。 是的,在词法分析器和解析器规则中,顺序是从上到下(其中优先级最高)。假设
是语法的切入点:
则首先尝试
,如果失败,则尝试匹配
。 对于词法分析器规则,例如,作为关键字的规则在可能匹配所述关键字的规则之前进行匹配: