xerces-c-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alberto Massari <amass...@datadirect.com>
Subject Re: method startElement() from class DOMLSParserFilter
Date Fri, 04 Sep 2009 14:12:16 GMT
In effect I am seeing so many problems with that code that the only 
suggestion I have is to get the latest 3.0 from the trunk and work with 
what I have just committed (or get the patch from 
http://svn.apache.org/viewvc?rev=811420&view=rev and apply to the 3.0.1 
code). This version should support your original code.

Alberto


Mirko Braun wrote:
> Hi Alberto,
>
> yes, i'm still using the method startElement(). Is it better
> to use the method acceptNode() to reject the DATA node from
> the DOM or is there any other possibility?
>
> Mirko
>
>
> -------- Original-Nachricht --------
>   
>> Datum: Fri, 04 Sep 2009 15:41:54 +0200
>> Von: Alberto Massari <amassari@datadirect.com>
>> An: c-users@xerces.apache.org
>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>     
>
>   
>> Hi Mirko,
>> are you still using startElement()? That API would mess with the current 
>> parent, so it would break the parsing at a certain point.
>>
>> Alberto
>>
>> Mirko Braun wrote:
>>     
>>> Hi Alberto,
>>>
>>> yes i'm sure that DATA is not a root node. I debugged a little bit.
>>> The exception occurs after the sixth time this DATA node was found.
>>>
>>> Mirko
>>>
>>> -------- Original-Nachricht --------
>>>   
>>>       
>>>> Datum: Fri, 04 Sep 2009 14:21:15 +0200
>>>> Von: Alberto Massari <amassari@datadirect.com>
>>>> An: c-users@xerces.apache.org
>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>>>     
>>>>         
>>>   
>>>       
>>>> Hi Mirko,
>>>> are you sure that your root node isn't one of those DATA elements? In 
>>>> this case the document node would see more than one root element.
>>>>
>>>> Alberto
>>>>
>>>> Mirko Braun wrote:
>>>>     
>>>>         
>>>>> Hi Alberto,
>>>>>
>>>>> thank you for you answer. I integrated the changes you
>>>>> suggested, but the result is still the same:
>>>>>
>>>>> DOM Error during parsing:
>>>>>
>>>>>       
>>>>>           
>> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
>>     
>>>>     
>>>>         
>>>>> DOMException code is:  3
>>>>> Message is: attempt is made to insert a node where it is not permitted
>>>>>
>>>>> Best regards,
>>>>> Mirko
>>>>>
>>>>> -------- Original-Nachricht --------
>>>>>   
>>>>>       
>>>>>           
>>>>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200
>>>>>> Von: Alberto Massari <amassari@datadirect.com>
>>>>>> An: c-users@xerces.apache.org
>>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>   
>>>>>       
>>>>>           
>>>>>> Hi Mirko,
>>>>>> I think the current implementation of the DOMLSParserFilter doesn't
>>>>>>         
>>>>>>             
>>>> work 
>>>>     
>>>>         
>>>>>> nicely with your code, as the rejected nodes are not recycled and
the
>>>>>> memory will grow to the same level as before.
>>>>>> Anyhow, you should instead override acceptNode like this:
>>>>>>
>>>>>> DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement*
>>>>>> node)
>>>>>> {
>>>>>>   // for element whose name is "DATA", skip it
>>>>>>    if (node->getNodeType()==DOMNode::ELEMENT_NODE && 
>>>>>> XMLString::compareString(node->getNodeName(), element_data)==0)
>>>>>>      return DOMParserFilter::FILTER_REJECT;
>>>>>>   else
>>>>>>     return DOMParserFilter::FILTER_ACCEPT;
>>>>>> }
>>>>>>
>>>>>> Then, change DOMLSParserImpl::endElement to add a call to 
>>>>>> origNode->release() after the call to removeChild().
>>>>>>
>>>>>> Alberto
>>>>>>
>>>>>>
>>>>>> Mirko Braun wrote:
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> Hello everybody,
>>>>>>>
>>>>>>> i would like to parse a quite large XML file (about 180 MB).
>>>>>>> I used the DOM interface because i need the tree for further
>>>>>>> processing of the data the xml file contains. Of course there
>>>>>>> is a lot of memory used during parsing the file and i got an
>>>>>>> "Out of memory" exception. 
>>>>>>>
>>>>>>> I noticed that a class DOMLSParserFilter comes along wiht Xercesc
>>>>>>>               
>> C++
>>     
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> 3.0.1 (Win32), which makes it possible to filter the Nodes during
>>>>>>         
>>>>>>             
>>>> parsing.
>>>>     
>>>>         
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> That is perfect for me because one XML-Element in my large file
>>>>>>> contains most of the data. This XML-Element is called DATA and
>>>>>>> appears serveral time in my XML file.
>>>>>>> So i had the idea to reject this XML-Element from the DOM tree
>>>>>>> during parsing to reduce the used memory by using the method
>>>>>>> startElement() of the DOMLSParserFilter class. After that i would
>>>>>>> use a SAX parser and just get all XML-Elements DATA with their
>>>>>>>               
>> values.
>>     
>>>>>>> But it does not work.
>>>>>>> I integregated my code into the DOMPrint example which comes
along
>>>>>>> with Xercesc C++ 3.0.1. The following error message occurred:

>>>>>>>
>>>>>>> DOM Error during parsing:
>>>>>>>       
>>>>>>>           
>>>>>>>               
>> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
>>     
>>>>     
>>>>         
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> DOMException code is:  3
>>>>>>> Message is: attempt is made to insert a node where it is not
>>>>>>>               
>> permitted
>>     
>>>>>>> Did i misunderstand the functionality of the DOMLSParserFilter
class
>>>>>>> and its method startElement?
>>>>>>> It is possible to realize my idea with the help of this class?
Did
>>>>>>> i something wrong with in my code (please have a look below)?
>>>>>>>
>>>>>>> I would be very grateful for any help.
>>>>>>>
>>>>>>> Thanks in advanced,
>>>>>>> Mirko
>>>>>>>
>>>>>>>
>>>>>>> DOMPrintFilter.hpp:
>>>>>>> --------------------
>>>>>>>
>>>>>>>
>>>>>>> class DOMParserFilter : public DOMLSParserFilter {
>>>>>>> public:
>>>>>>>
>>>>>>>   DOMParserFilter(DOMNodeFilter::ShowType whatToShow =
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> DOMNodeFilter::SHOW_ALL);
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>     ~DOMParserFilter(){};
>>>>>>>
>>>>>>>     virtual FilterAction startElement(DOMElement* node);
>>>>>>>     virtual FilterAction acceptNode(DOMNode* node){return
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> DOMParserFilter::FILTER_ACCEPT;};
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>     virtual DOMNodeFilter::ShowType getWhatToShow() const {return
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> fWhatToShow;};
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> private:
>>>>>>>     DOMNodeFilter::ShowType fWhatToShow;
>>>>>>> };
>>>>>>>
>>>>>>>
>>>>>>> DOMPrintFilter.cpp:
>>>>>>> --------------------
>>>>>>>
>>>>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow)
>>>>>>> :fWhatToShow(whatToShow)
>>>>>>> {}
>>>>>>>
>>>>>>> DOMParserFilter::FilterAction
>>>>>>>           
>>>>>>>               
>>>> DOMParserFilter::startElement(DOMElement*
>>>>     
>>>>         
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> node)
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> {
>>>>>>>   // for element whose name is "DATA", skip it
>>>>>>>   if (XMLString::compareString(node->getNodeName(),
>>>>>>>               
>> element_data)==0)
>>     
>>>>>>>     return DOMParserFilter::FILTER_REJECT;
>>>>>>>   else
>>>>>>>     return DOMParserFilter::FILTER_ACCEPT;
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> DOMPrint.cpp:
>>>>>>> ---------------
>>>>>>>
>>>>>>> static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S,
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> xercesc::chNull };
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> xercesc::DOMImplementation *implParser =
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS);
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> xercesc::DOMLSParser* parser =
>>>>>>>       
>>>>>>>           
>>>>>>>               
>> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS,
0);
>>     
>>>>     
>>>>         
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
>>>>>>>
>>>>>>>           
>>>>>>>               
>> parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler,
>>     
>>>>     
>>>>         
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> errReporter);
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>     
>>>>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter();
>>>>>>> parser->setFilter(pDOMParserFilter);
>>>>>>>     
>>>>>>>
>>>>>>>     //
>>>>>>>     //  Parse the XML file, catching any XML exceptions that
might
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> propogate
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>     //  out of it.
>>>>>>>     //
>>>>>>>     bool errorsOccured = false;
>>>>>>>     DOMDocument *doc = NULL;
>>>>>>>
>>>>>>>     try
>>>>>>>     {
>>>>>>>       doc = parser->parseURI(gXmlFile);
>>>>>>>     }
>>>>>>>     catch (const OutOfMemoryException&)
>>>>>>>     {
>>>>>>>         XERCES_STD_QUALIFIER cerr << "OutOfMemoryException"
<<
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> XERCES_STD_QUALIFIER endl;
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>         errorsOccured = true;
>>>>>>>     }
>>>>>>>     catch (const XMLException& e)
>>>>>>>     {
>>>>>>>         XERCES_STD_QUALIFIER cerr << "An error occurred
during
>>>>>>>           
>>>>>>>               
>>>> parsing\n
>>>>     
>>>>         
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>>   Message: "
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>              << StrX(e.getMessage()) << XERCES_STD_QUALIFIER
endl;
>>>>>>>         errorsOccured = true;
>>>>>>>     }
>>>>>>>
>>>>>>>     catch (const DOMException& e)
>>>>>>>     {
>>>>>>>       const unsigned int maxChars = 2047;
>>>>>>>       XMLCh errText[maxChars + 1];
>>>>>>>
>>>>>>>       XERCES_STD_QUALIFIER cerr << "\nDOM Error during
parsing: '"
>>>>>>>               
>> <<
>>     
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> gXmlFile << "'\n"
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>            << "DOMException code is:  " << e.code
<<
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> XERCES_STD_QUALIFIER endl;
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>       if (DOMImplementation::loadDOMExceptionMsg(e.code, errText,
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> maxChars))
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>            XERCES_STD_QUALIFIER cerr << "Message is: "
<<
>>>>>>>           
>>>>>>>               
>>>> StrX(errText)
>>>>     
>>>>         
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> << XERCES_STD_QUALIFIER endl;
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>       errorsOccured = true;
>>>>>>>     }
>>>>>>>
>>>>>>>     catch (...)
>>>>>>>     {
>>>>>>>         XERCES_STD_QUALIFIER cerr << "An error occurred
during
>>>>>>>           
>>>>>>>               
>>>> parsing\n
>>>>     
>>>>         
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> " << XERCES_STD_QUALIFIER endl;
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>         errorsOccured = true;
>>>>>>>     }
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>   
>>>>>       
>>>>>           
>>>   
>>>       
>
>   


Mime
View raw message