xerces-c-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alberto Massari <amass...@datadirect.com>
Subject Re: method startElement() from class DOMLSParserFilter
Date Fri, 04 Sep 2009 13:41:54 GMT
Hi Mirko,
are you still using startElement()? That API would mess with the current 
parent, so it would break the parsing at a certain point.

Alberto

Mirko Braun wrote:
> Hi Alberto,
>
> yes i'm sure that DATA is not a root node. I debugged a little bit.
> The exception occurs after the sixth time this DATA node was found.
>
> Mirko
>
> -------- Original-Nachricht --------
>   
>> Datum: Fri, 04 Sep 2009 14:21:15 +0200
>> Von: Alberto Massari <amassari@datadirect.com>
>> An: c-users@xerces.apache.org
>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>     
>
>   
>> Hi Mirko,
>> are you sure that your root node isn't one of those DATA elements? In 
>> this case the document node would see more than one root element.
>>
>> Alberto
>>
>> Mirko Braun wrote:
>>     
>>> Hi Alberto,
>>>
>>> thank you for you answer. I integrated the changes you
>>> suggested, but the result is still the same:
>>>
>>> DOM Error during parsing:
>>>
>>>       
>> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
>>     
>>> DOMException code is:  3
>>> Message is: attempt is made to insert a node where it is not permitted
>>>
>>> Best regards,
>>> Mirko
>>>
>>> -------- Original-Nachricht --------
>>>   
>>>       
>>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200
>>>> Von: Alberto Massari <amassari@datadirect.com>
>>>> An: c-users@xerces.apache.org
>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>>>     
>>>>         
>>>   
>>>       
>>>> Hi Mirko,
>>>> I think the current implementation of the DOMLSParserFilter doesn't
>>>>         
>> work 
>>     
>>>> nicely with your code, as the rejected nodes are not recycled and the 
>>>> memory will grow to the same level as before.
>>>> Anyhow, you should instead override acceptNode like this:
>>>>
>>>> DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement*
>>>> node)
>>>> {
>>>>   // for element whose name is "DATA", skip it
>>>>    if (node->getNodeType()==DOMNode::ELEMENT_NODE && 
>>>> XMLString::compareString(node->getNodeName(), element_data)==0)
>>>>      return DOMParserFilter::FILTER_REJECT;
>>>>   else
>>>>     return DOMParserFilter::FILTER_ACCEPT;
>>>> }
>>>>
>>>> Then, change DOMLSParserImpl::endElement to add a call to 
>>>> origNode->release() after the call to removeChild().
>>>>
>>>> Alberto
>>>>
>>>>
>>>> Mirko Braun wrote:
>>>>     
>>>>         
>>>>> Hello everybody,
>>>>>
>>>>> i would like to parse a quite large XML file (about 180 MB).
>>>>> I used the DOM interface because i need the tree for further
>>>>> processing of the data the xml file contains. Of course there
>>>>> is a lot of memory used during parsing the file and i got an
>>>>> "Out of memory" exception. 
>>>>>
>>>>> I noticed that a class DOMLSParserFilter comes along wiht Xercesc C++
>>>>>       
>>>>>           
>>>> 3.0.1 (Win32), which makes it possible to filter the Nodes during
>>>>         
>> parsing.
>>     
>>>>     
>>>>         
>>>>> That is perfect for me because one XML-Element in my large file
>>>>> contains most of the data. This XML-Element is called DATA and
>>>>> appears serveral time in my XML file.
>>>>> So i had the idea to reject this XML-Element from the DOM tree
>>>>> during parsing to reduce the used memory by using the method
>>>>> startElement() of the DOMLSParserFilter class. After that i would
>>>>> use a SAX parser and just get all XML-Elements DATA with their values.
>>>>> But it does not work.
>>>>> I integregated my code into the DOMPrint example which comes along
>>>>> with Xercesc C++ 3.0.1. The following error message occurred: 
>>>>>
>>>>> DOM Error during parsing:
>>>>>       
>>>>>           
>> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
>>     
>>>>     
>>>>         
>>>>> DOMException code is:  3
>>>>> Message is: attempt is made to insert a node where it is not permitted
>>>>>
>>>>>
>>>>> Did i misunderstand the functionality of the DOMLSParserFilter class
>>>>> and its method startElement?
>>>>> It is possible to realize my idea with the help of this class? Did
>>>>> i something wrong with in my code (please have a look below)?
>>>>>
>>>>> I would be very grateful for any help.
>>>>>
>>>>> Thanks in advanced,
>>>>> Mirko
>>>>>
>>>>>
>>>>> DOMPrintFilter.hpp:
>>>>> --------------------
>>>>>
>>>>>
>>>>> class DOMParserFilter : public DOMLSParserFilter {
>>>>> public:
>>>>>
>>>>>   DOMParserFilter(DOMNodeFilter::ShowType whatToShow =
>>>>>       
>>>>>           
>>>> DOMNodeFilter::SHOW_ALL);
>>>>     
>>>>         
>>>>>     ~DOMParserFilter(){};
>>>>>
>>>>>     virtual FilterAction startElement(DOMElement* node);
>>>>>     virtual FilterAction acceptNode(DOMNode* node){return
>>>>>       
>>>>>           
>>>> DOMParserFilter::FILTER_ACCEPT;};
>>>>     
>>>>         
>>>>>     virtual DOMNodeFilter::ShowType getWhatToShow() const {return
>>>>>       
>>>>>           
>>>> fWhatToShow;};
>>>>     
>>>>         
>>>>> private:
>>>>>     DOMNodeFilter::ShowType fWhatToShow;
>>>>> };
>>>>>
>>>>>
>>>>> DOMPrintFilter.cpp:
>>>>> --------------------
>>>>>
>>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow)
>>>>> :fWhatToShow(whatToShow)
>>>>> {}
>>>>>
>>>>> DOMParserFilter::FilterAction
>>>>>           
>> DOMParserFilter::startElement(DOMElement*
>>     
>>>>>       
>>>>>           
>>>> node)
>>>>     
>>>>         
>>>>> {
>>>>>   // for element whose name is "DATA", skip it
>>>>>   if (XMLString::compareString(node->getNodeName(), element_data)==0)
>>>>>     return DOMParserFilter::FILTER_REJECT;
>>>>>   else
>>>>>     return DOMParserFilter::FILTER_ACCEPT;
>>>>> }
>>>>>
>>>>>
>>>>> DOMPrint.cpp:
>>>>> ---------------
>>>>>
>>>>> static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S,
>>>>>       
>>>>>           
>>>> xercesc::chNull };
>>>>     
>>>>         
>>>>> xercesc::DOMImplementation *implParser =
>>>>>       
>>>>>           
>>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS);
>>>>     
>>>>         
>>>>> xercesc::DOMLSParser* parser =
>>>>>       
>>>>>           
>> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS,
0);
>>     
>>>>     
>>>>         
>>>>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
>>>>>
>>>>>           
>> parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler,
>>     
>>>>>       
>>>>>           
>>>> errReporter);
>>>>     
>>>>         
>>>>>     
>>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter();
>>>>> parser->setFilter(pDOMParserFilter);
>>>>>     
>>>>>
>>>>>     //
>>>>>     //  Parse the XML file, catching any XML exceptions that might
>>>>>       
>>>>>           
>>>> propogate
>>>>     
>>>>         
>>>>>     //  out of it.
>>>>>     //
>>>>>     bool errorsOccured = false;
>>>>>     DOMDocument *doc = NULL;
>>>>>
>>>>>     try
>>>>>     {
>>>>>       doc = parser->parseURI(gXmlFile);
>>>>>     }
>>>>>     catch (const OutOfMemoryException&)
>>>>>     {
>>>>>         XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" <<
>>>>>       
>>>>>           
>>>> XERCES_STD_QUALIFIER endl;
>>>>     
>>>>         
>>>>>         errorsOccured = true;
>>>>>     }
>>>>>     catch (const XMLException& e)
>>>>>     {
>>>>>         XERCES_STD_QUALIFIER cerr << "An error occurred during
>>>>>           
>> parsing\n
>>     
>>>>>       
>>>>>           
>>>>   Message: "
>>>>     
>>>>         
>>>>>              << StrX(e.getMessage()) << XERCES_STD_QUALIFIER
endl;
>>>>>         errorsOccured = true;
>>>>>     }
>>>>>
>>>>>     catch (const DOMException& e)
>>>>>     {
>>>>>       const unsigned int maxChars = 2047;
>>>>>       XMLCh errText[maxChars + 1];
>>>>>
>>>>>       XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing:
'" <<
>>>>>       
>>>>>           
>>>> gXmlFile << "'\n"
>>>>     
>>>>         
>>>>>            << "DOMException code is:  " << e.code <<
>>>>>       
>>>>>           
>>>> XERCES_STD_QUALIFIER endl;
>>>>     
>>>>         
>>>>>       if (DOMImplementation::loadDOMExceptionMsg(e.code, errText,
>>>>>       
>>>>>           
>>>> maxChars))
>>>>     
>>>>         
>>>>>            XERCES_STD_QUALIFIER cerr << "Message is: " <<
>>>>>           
>> StrX(errText)
>>     
>>>>>       
>>>>>           
>>>> << XERCES_STD_QUALIFIER endl;
>>>>     
>>>>         
>>>>>       errorsOccured = true;
>>>>>     }
>>>>>
>>>>>     catch (...)
>>>>>     {
>>>>>         XERCES_STD_QUALIFIER cerr << "An error occurred during
>>>>>           
>> parsing\n
>>     
>>>>>       
>>>>>           
>>>> " << XERCES_STD_QUALIFIER endl;
>>>>     
>>>>         
>>>>>         errorsOccured = true;
>>>>>     }
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>   
>>>>>       
>>>>>           
>>>   
>>>       
>
>   


Mime
View raw message