在Java中使用SAX解析大型XML

| 我正在尝试分析堆栈溢出数据转储,其中一个表称为posts.xml,其中有大约1000万个条目。样本XML:
<?xml version=\"1.0\" encoding=\"utf-8\"?>
<posts>
  <row Id=\"1\" PostTypeId=\"1\" AcceptedAnswerId=\"26\" CreationDate=\"2010-07-07T19:06:25.043\" Score=\"10\" ViewCount=\"1192\" Body=\"&lt;p&gt;Now that the Engineer update has come, there will be lots of Engineers building up everywhere.  How should this best be handled?&lt;/p&gt;&#xA;\" OwnerUserId=\"11\" LastEditorUserId=\"56\" LastEditorDisplayName=\"\" LastEditDate=\"2010-08-27T22:38:43.840\" LastActivityDate=\"2010-08-27T22:38:43.840\" Title=\"In Team Fortress 2, what is a good strategy to deal with lots of engineers turtling on the other team?\" Tags=\"&lt;strategy&gt;&lt;team-fortress-2&gt;&lt;tactics&gt;\" AnswerCount=\"5\" CommentCount=\"7\" />
  <row Id=\"2\" PostTypeId=\"1\" AcceptedAnswerId=\"184\" CreationDate=\"2010-07-07T19:07:58.427\" Score=\"5\" ViewCount=\"469\" Body=\"&lt;p&gt;I know I can create a Warp Gate and teleport to Pylons, but I have no idea how to make Warp Prisms or know if there\'s any other unit capable of transporting.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;I would in particular like this to built remote bases in 1v1&lt;/p&gt;&#xA;\" OwnerUserId=\"10\" LastEditorUserId=\"68\" LastEditorDisplayName=\"\" LastEditDate=\"2010-07-08T00:16:46.013\" LastActivityDate=\"2010-07-08T00:21:13.163\" Title=\"What protoss unit can transport others?\" Tags=\"&lt;starcraft-2&gt;&lt;how-to&gt;&lt;protoss&gt;\" AnswerCount=\"3\" CommentCount=\"2\" />
  <row Id=\"3\" PostTypeId=\"1\" AcceptedAnswerId=\"56\" CreationDate=\"2010-07-07T19:09:46.317\" Score=\"7\" ViewCount=\"356\" Body=\"&lt;p&gt;Steam won\'t let me have two instances running with the same user logged in.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;Does that mean I cannot run a dedicated server on a PC (for example, for Left 4 Dead 2) &lt;em&gt;and&lt;/em&gt; play from another machine?&lt;/p&gt;&#xA;&#xA;&lt;p&gt;Is there a way to run the dedicated server without running steam? Is there a configuration option I\'m missing?&lt;/p&gt;&#xA;\" OwnerUserId=\"14\" LastActivityDate=\"2010-07-07T19:27:04.777\" Title=\"How can I run a dedicated server from steam?\" Tags=\"&lt;steam&gt;&lt;left-4-dead-2&gt;&lt;dedicated-server&gt;&lt;account&gt;\" AnswerCount=\"1\" />
  <row Id=\"4\" PostTypeId=\"1\" AcceptedAnswerId=\"14\" CreationDate=\"2010-07-07T19:11:05.640\" Score=\"10\" ViewCount=\"201\" Body=\"&lt;p&gt;When I get to the insult sword-fighting stage of The Secret of Monkey Island, do I have to learn every single insult and comeback in order to beat the Sword Master?&lt;/p&gt;&#xA;\" OwnerUserId=\"17\" LastEditorUserId=\"17\" LastEditorDisplayName=\"\" LastEditDate=\"2010-07-08T21:25:04.787\" LastActivityDate=\"2010-07-08T21:25:04.787\" Title=\"Do I have to learn all of the insults and comebacks to be able to advance in The Secret of Monkey Island?\" Tags=\"&lt;monkey-island&gt;&lt;adventure&gt;\" AnswerCount=\"3\" CommentCount=\"2\" />
我想解析此xml,但仅加载xml的某些属性,例如ID,PostTypeId,AcceptedAnswerId和其他2个属性。 SAX中是否有办法只加载这些属性?如果有的话怎么办?我对SAX还是很陌生,所以一些指导会有所帮助。 否则,加载整个程序只会很慢,并且某些属性将不会被使用,因此毫无用处。 另一个问题是,是否有可能跳到ID为X的特定行?如果可能的话,我该怎么做?     
已邀请:
\“ StartElement \” Sax事件允许处理单个XML ELement。 在Java代码中,您必须实现此方法
public void startElement(String uri, String localName,
    String qName, Attributes attributes)
    throws SAXException {

    if(\"row\".equals(localName)) {
        //this code is executed for every xml element \"row\"
        String id = attributes.getValue(\"id\");
        String PostTypeId = attributes.getValue(\"PostTypeId\");
        String AcceptedAnswerId = attributes.getValue(\"AcceptedAnswerId\");
        //others two
        // you have your att values for an \"row\" element
    }

 }
对于每个元素,您都可以访问: 命名空间URI XML QName XML LocalName 属性图,您可以在此处提取两个属性... 有关特定细节,请参见ContentHandler实现。 再见 更新:改进了以前的代码段。     
与我已经在这里回答的方法几乎相同。 向下滚动至
org.xml.sax Implementation
部分。您只需要一个自定义处理程序。     
是的,您可以覆盖仅处理所需元素的方法: http://www.javacommerce.com/displaypage.jsp?name=saxparser1.sql&id=18232 http://www.java2s.com/Code/Java/XML/SAXDemo.htm     
SAX不会“加载”元素。它会将每个元素的开始和结束通知您的应用程序,并且完全由您的应用程序决定它需要注意哪些元素。     

要回复问题请先登录注册