如何提高VBA中XML解析的速度

我有一个需要在VBA中解析的大型XML文件(excel 2003& 2007)。 xml文件中可能存在超过11,000个“行”数据,每个“行”具有10到20个列之间的数据。这最终只是一个庞大的任务,只需解析并获取数据(5 - 7分钟)。我尝试读取xml并将每个'row'放入字典(key =行号,值=行属性),但这需要同样长的时间。 遍历DOM需要永远。有更有效的方法吗?
Dim XMLDict
    Sub ParseXML(ByRef RootNode As IXMLDOMNode)
        Dim Counter As Long
        Dim RowList As IXMLDOMNodeList
        Dim ColumnList As IXMLDOMNodeList
        Dim RowNode As IXMLDOMNode
        Dim ColumnNode As IXMLDOMNode
        Counter = 1
        Set RowList = RootNode.SelectNodes("Row")

        For Each RowNode In RowList
            Set ColumnList = RowNode.SelectNodes("Col")
            Dim NodeValues As String
            For Each ColumnNode In ColumnList
                NodeValues = NodeValues & "|" & ColumnNode.Attributes.getNamedItem("id").Text & ":" & ColumnNode.Text
            Next ColumnNode
            XMLDICT.Add Counter, NodeValues
            Counter = Counter + 1
        Next RowNode
    End Sub
    
已邀请:
您可以尝试使用SAX而不是DOM。当您所做的只是解析文档并且文档大小非常小时,SAX应该更快。这里是MSXML中SAX2实现的参考 我通常直接使用DOM来进行Excel中的大多数XML解析,但SAX似乎在某些情况下具有优势。这里的简短比较可能有助于解释它们之间的差异。 这是一个黑客共同的例子(部分基于此)只使用
Debug.Print
输出: 通过工具>参考添加对“Microsoft XML,v6.0”的引用 在普通模块中添加此代码
Option Explicit

Sub main()

Dim saxReader As SAXXMLReader60
Dim saxhandler As ContentHandlerImpl

Set saxReader = New SAXXMLReader60
Set saxhandler = New ContentHandlerImpl

Set saxReader.contentHandler = saxhandler
saxReader.parseURL "file://C:UsersfooDesktopbar.xml"

Set saxReader = Nothing

End Sub
添加一个类模块,将其命名为
ContentHandlerImpl
并添加以下代码
Option Explicit

Implements IVBSAXContentHandler

Private lCounter As Long
Private sNodeValues As String
Private bGetChars As Boolean
使用模块顶部的左侧下拉菜单选择“IVBSAXContentHandler”,然后使用右侧下拉菜单依次为每个事件添加存根(从
characters
startPrefixMapping
) 将代码添加到某些存根中,如下所示 明确设置计数器和标志,以显示我们是否要在此时读取文本数据
Private Sub IVBSAXContentHandler_startDocument()

lCounter = 0
bGetChars = False

End Sub
每次启动新元素时,请检查元素的名称并采取适当的操作
Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String, strLocalName As String, strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes)

Select Case strLocalName
    Case "Row"
        sNodeValues = ""
    Case "Col"
        sNodeValues = sNodeValues & "|" & oAttributes.getValueFromName(strNamespaceURI, "id") & ":"
        bGetChars = True
    Case Else
        ' do nothing
End Select

End Sub
检查我们是否对文本数据感兴趣,如果是的话,请删除任何无关的空格并删除所有换行符(根据您要解析的文档,这可能是也可能不可取)
Private Sub IVBSAXContentHandler_characters(strChars As String)

If (bGetChars) Then
    sNodeValues = sNodeValues & Replace(Trim$(strChars), vbLf, "")
End If

End Sub
如果我们已达到
Col
的末尾,则停止阅读文本值;如果我们到达了
Row
的末尾,则打印出节点值的字符串
Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String, strLocalName As String, strQName As String)

Select Case strLocalName
    Case "Col"
        bGetChars = False
    Case "Row"
        lCounter = lCounter + 1
        Debug.Print lCounter & " " & sNodeValues
    Case Else
        ' do nothing
End Select

End Sub
为了使事情更清楚,这里是
ContentHandlerImpl
的完整版本,其中包含了存根方法:
Option Explicit

Implements IVBSAXContentHandler

Private lCounter As Long
Private sNodeValues As String
Private bGetChars As Boolean

Private Sub IVBSAXContentHandler_characters(strChars As String)

If (bGetChars) Then
    sNodeValues = sNodeValues & Replace(Trim$(strChars), vbLf, "")
End If

End Sub

Private Property Set IVBSAXContentHandler_documentLocator(ByVal RHS As MSXML2.IVBSAXLocator)

End Property

Private Sub IVBSAXContentHandler_endDocument()

End Sub

Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String, strLocalName As String, strQName As String)

Select Case strLocalName
    Case "Col"
        bGetChars = False
    Case "Row"
        lCounter = lCounter + 1
        Debug.Print lCounter & " " & sNodeValues
    Case Else
        ' do nothing
End Select

End Sub

Private Sub IVBSAXContentHandler_endPrefixMapping(strPrefix As String)

End Sub

Private Sub IVBSAXContentHandler_ignorableWhitespace(strChars As String)

End Sub

Private Sub IVBSAXContentHandler_processingInstruction(strTarget As String, strData As String)

End Sub

Private Sub IVBSAXContentHandler_skippedEntity(strName As String)

End Sub

Private Sub IVBSAXContentHandler_startDocument()

lCounter = 0
bGetChars = False

End Sub

Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String, strLocalName As String, strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes)

Select Case strLocalName
    Case "Row"
        sNodeValues = ""
    Case "Col"
        sNodeValues = sNodeValues & "|" & oAttributes.getValueFromName(strNamespaceURI, "id") & ":"
        bGetChars = True
    Case Else
        ' do nothing
End Select

End Sub

Private Sub IVBSAXContentHandler_startPrefixMapping(strPrefix As String, strURI As String)

End Sub
    
使用
SelectSingleNode
功能。这将允许您基于模式匹配搜索节点。 例如,我创建了以下函数:
Private Function getXMLNodeValue(ByRef xmlDoc As MSXML2.DOMDocument, ByVal xmlPath As String)
    Dim node As IXMLDOMNode
    Set node = xmlDoc.SelectSingleNode(xmlPath)
    If node Is Nothing Then getXMLNodeValue = vbNullString Else getXMLNodeValue = node.Text
End Function
现在,如果我有以下XML文件: 我可以简单地打电话:
myValue = getXMLNodeValue(xmlResult, "//ErrorStatus/Source")
它将跳转到任何深度的第一个名为“错误状态”的键,然后拉出“源”节点中的文本 - 返回“INTEGRATION”     

要回复问题请先登录注册