xerces-c-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alberto Massari <amass...@datadirect.com>
Subject Re: method startElement() from class DOMLSParserFilter
Date Mon, 07 Sep 2009 07:26:05 GMT
Mirko Braun wrote:
> Hi Alberto,
>
> thank you very much for your help. I integrated the patch in
> 3.0.1 and it worked. There is no exception any more.
> But there is still one problem. The usage of memory is still
> of the same size. I think if a node is rejected from the tree
> the usage of memory should also decrease. Is my conclusion
> correct?
>   

Yes, if a node is rejected is should be marked for recycling; how much 
memory are you seeing is been used?

Alberto

> Mirko
>
> -------- Original-Nachricht --------
>   
>> Datum: Fri, 04 Sep 2009 16:12:16 +0200
>> Von: Alberto Massari <amassari@datadirect.com>
>> An: c-users@xerces.apache.org
>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>     
>
>   
>> In effect I am seeing so many problems with that code that the only 
>> suggestion I have is to get the latest 3.0 from the trunk and work with 
>> what I have just committed (or get the patch from 
>> http://svn.apache.org/viewvc?rev=811420&view=rev and apply to the 3.0.1 
>> code). This version should support your original code.
>>
>> Alberto
>>
>>
>> Mirko Braun wrote:
>>     
>>> Hi Alberto,
>>>
>>> yes, i'm still using the method startElement(). Is it better
>>> to use the method acceptNode() to reject the DATA node from
>>> the DOM or is there any other possibility?
>>>
>>> Mirko
>>>
>>>
>>> -------- Original-Nachricht --------
>>>   
>>>       
>>>> Datum: Fri, 04 Sep 2009 15:41:54 +0200
>>>> Von: Alberto Massari <amassari@datadirect.com>
>>>> An: c-users@xerces.apache.org
>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>>>     
>>>>         
>>>   
>>>       
>>>> Hi Mirko,
>>>> are you still using startElement()? That API would mess with the
>>>>         
>> current 
>>     
>>>> parent, so it would break the parsing at a certain point.
>>>>
>>>> Alberto
>>>>
>>>> Mirko Braun wrote:
>>>>     
>>>>         
>>>>> Hi Alberto,
>>>>>
>>>>> yes i'm sure that DATA is not a root node. I debugged a little bit.
>>>>> The exception occurs after the sixth time this DATA node was found.
>>>>>
>>>>> Mirko
>>>>>
>>>>> -------- Original-Nachricht --------
>>>>>   
>>>>>       
>>>>>           
>>>>>> Datum: Fri, 04 Sep 2009 14:21:15 +0200
>>>>>> Von: Alberto Massari <amassari@datadirect.com>
>>>>>> An: c-users@xerces.apache.org
>>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>   
>>>>>       
>>>>>           
>>>>>> Hi Mirko,
>>>>>> are you sure that your root node isn't one of those DATA elements?
In
>>>>>> this case the document node would see more than one root element.
>>>>>>
>>>>>> Alberto
>>>>>>
>>>>>> Mirko Braun wrote:
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> Hi Alberto,
>>>>>>>
>>>>>>> thank you for you answer. I integrated the changes you
>>>>>>> suggested, but the result is still the same:
>>>>>>>
>>>>>>> DOM Error during parsing:
>>>>>>>
>>>>>>>       
>>>>>>>           
>>>>>>>               
>> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
>>     
>>>>     
>>>>         
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> DOMException code is:  3
>>>>>>> Message is: attempt is made to insert a node where it is not
>>>>>>>               
>> permitted
>>     
>>>>>>> Best regards,
>>>>>>> Mirko
>>>>>>>
>>>>>>> -------- Original-Nachricht --------
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200
>>>>>>>> Von: Alberto Massari <amassari@datadirect.com>
>>>>>>>> An: c-users@xerces.apache.org
>>>>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>>>> Hi Mirko,
>>>>>>>> I think the current implementation of the DOMLSParserFilter
doesn't
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>> work 
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>> nicely with your code, as the rejected nodes are not recycled
and
>>>>>>>>                 
>> the
>>     
>>>>>>>> memory will grow to the same level as before.
>>>>>>>> Anyhow, you should instead override acceptNode like this:
>>>>>>>>
>>>>>>>> DOMParserFilter::FilterAction
>>>>>>>>                 
>> DOMParserFilter::acceptNode(DOMElement*
>>     
>>>>>>>> node)
>>>>>>>> {
>>>>>>>>   // for element whose name is "DATA", skip it
>>>>>>>>    if (node->getNodeType()==DOMNode::ELEMENT_NODE &&

>>>>>>>> XMLString::compareString(node->getNodeName(), element_data)==0)
>>>>>>>>      return DOMParserFilter::FILTER_REJECT;
>>>>>>>>   else
>>>>>>>>     return DOMParserFilter::FILTER_ACCEPT;
>>>>>>>> }
>>>>>>>>
>>>>>>>> Then, change DOMLSParserImpl::endElement to add a call to

>>>>>>>> origNode->release() after the call to removeChild().
>>>>>>>>
>>>>>>>> Alberto
>>>>>>>>
>>>>>>>>
>>>>>>>> Mirko Braun wrote:
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>> Hello everybody,
>>>>>>>>>
>>>>>>>>> i would like to parse a quite large XML file (about 180
MB).
>>>>>>>>> I used the DOM interface because i need the tree for
further
>>>>>>>>> processing of the data the xml file contains. Of course
there
>>>>>>>>> is a lot of memory used during parsing the file and i
got an
>>>>>>>>> "Out of memory" exception. 
>>>>>>>>>
>>>>>>>>> I noticed that a class DOMLSParserFilter comes along
wiht Xercesc
>>>>>>>>>               
>>>>>>>>>                   
>>>> C++
>>>>     
>>>>         
>>>>>>>>>       
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>> 3.0.1 (Win32), which makes it possible to filter the Nodes
during
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>> parsing.
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>> That is perfect for me because one XML-Element in my
large file
>>>>>>>>> contains most of the data. This XML-Element is called
DATA and
>>>>>>>>> appears serveral time in my XML file.
>>>>>>>>> So i had the idea to reject this XML-Element from the
DOM tree
>>>>>>>>> during parsing to reduce the used memory by using the
method
>>>>>>>>> startElement() of the DOMLSParserFilter class. After
that i would
>>>>>>>>> use a SAX parser and just get all XML-Elements DATA with
their
>>>>>>>>>               
>>>>>>>>>                   
>>>> values.
>>>>     
>>>>         
>>>>>>>>> But it does not work.
>>>>>>>>> I integregated my code into the DOMPrint example which
comes along
>>>>>>>>> with Xercesc C++ 3.0.1. The following error message occurred:

>>>>>>>>>
>>>>>>>>> DOM Error during parsing:
>>>>>>>>>       
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
>>     
>>>>     
>>>>         
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>> DOMException code is:  3
>>>>>>>>> Message is: attempt is made to insert a node where it
is not
>>>>>>>>>               
>>>>>>>>>                   
>>>> permitted
>>>>     
>>>>         
>>>>>>>>> Did i misunderstand the functionality of the DOMLSParserFilter
>>>>>>>>>                   
>> class
>>     
>>>>>>>>> and its method startElement?
>>>>>>>>> It is possible to realize my idea with the help of this
class? Did
>>>>>>>>> i something wrong with in my code (please have a look
below)?
>>>>>>>>>
>>>>>>>>> I would be very grateful for any help.
>>>>>>>>>
>>>>>>>>> Thanks in advanced,
>>>>>>>>> Mirko
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> DOMPrintFilter.hpp:
>>>>>>>>> --------------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> class DOMParserFilter : public DOMLSParserFilter {
>>>>>>>>> public:
>>>>>>>>>
>>>>>>>>>   DOMParserFilter(DOMNodeFilter::ShowType whatToShow
=
>>>>>>>>>       
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>> DOMNodeFilter::SHOW_ALL);
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>>     ~DOMParserFilter(){};
>>>>>>>>>
>>>>>>>>>     virtual FilterAction startElement(DOMElement* node);
>>>>>>>>>     virtual FilterAction acceptNode(DOMNode* node){return
>>>>>>>>>       
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>> DOMParserFilter::FILTER_ACCEPT;};
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>>     virtual DOMNodeFilter::ShowType getWhatToShow() const
{return
>>>>>>>>>       
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>> fWhatToShow;};
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>> private:
>>>>>>>>>     DOMNodeFilter::ShowType fWhatToShow;
>>>>>>>>> };
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> DOMPrintFilter.cpp:
>>>>>>>>> --------------------
>>>>>>>>>
>>>>>>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType
>>>>>>>>>                   
>> whatToShow)
>>     
>>>>>>>>> :fWhatToShow(whatToShow)
>>>>>>>>> {}
>>>>>>>>>
>>>>>>>>> DOMParserFilter::FilterAction
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>> DOMParserFilter::startElement(DOMElement*
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>>>       
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>> node)
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>> {
>>>>>>>>>   // for element whose name is "DATA", skip it
>>>>>>>>>   if (XMLString::compareString(node->getNodeName(),
>>>>>>>>>               
>>>>>>>>>                   
>>>> element_data)==0)
>>>>     
>>>>         
>>>>>>>>>     return DOMParserFilter::FILTER_REJECT;
>>>>>>>>>   else
>>>>>>>>>     return DOMParserFilter::FILTER_ACCEPT;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> DOMPrint.cpp:
>>>>>>>>> ---------------
>>>>>>>>>
>>>>>>>>> static const XMLCh gLS[] = { xercesc::chLatin_L,
>>>>>>>>>                   
>> xercesc::chLatin_S,
>>     
>>>>>>>>>       
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>> xercesc::chNull };
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>> xercesc::DOMImplementation *implParser =
>>>>>>>>>       
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS);
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>> xercesc::DOMLSParser* parser =
>>>>>>>>>       
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS,
0);
>>     
>>>>     
>>>>         
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
>>>>>>>>>
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>> parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler,
>>     
>>>>     
>>>>         
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>>>       
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>> errReporter);
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>>     
>>>>>>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter();
>>>>>>>>> parser->setFilter(pDOMParserFilter);
>>>>>>>>>     
>>>>>>>>>
>>>>>>>>>     //
>>>>>>>>>     //  Parse the XML file, catching any XML exceptions
that might
>>>>>>>>>       
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>> propogate
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>>     //  out of it.
>>>>>>>>>     //
>>>>>>>>>     bool errorsOccured = false;
>>>>>>>>>     DOMDocument *doc = NULL;
>>>>>>>>>
>>>>>>>>>     try
>>>>>>>>>     {
>>>>>>>>>       doc = parser->parseURI(gXmlFile);
>>>>>>>>>     }
>>>>>>>>>     catch (const OutOfMemoryException&)
>>>>>>>>>     {
>>>>>>>>>         XERCES_STD_QUALIFIER cerr << "OutOfMemoryException"
<<
>>>>>>>>>       
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>> XERCES_STD_QUALIFIER endl;
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>>         errorsOccured = true;
>>>>>>>>>     }
>>>>>>>>>     catch (const XMLException& e)
>>>>>>>>>     {
>>>>>>>>>         XERCES_STD_QUALIFIER cerr << "An error
occurred during
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>> parsing\n
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>>>       
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>>   Message: "
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>>              << StrX(e.getMessage()) << XERCES_STD_QUALIFIER
endl;
>>>>>>>>>         errorsOccured = true;
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>>     catch (const DOMException& e)
>>>>>>>>>     {
>>>>>>>>>       const unsigned int maxChars = 2047;
>>>>>>>>>       XMLCh errText[maxChars + 1];
>>>>>>>>>
>>>>>>>>>       XERCES_STD_QUALIFIER cerr << "\nDOM Error
during parsing: '"
>>>>>>>>>               
>>>>>>>>>                   
>>>> <<
>>>>     
>>>>         
>>>>>>>>>       
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>> gXmlFile << "'\n"
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>>            << "DOMException code is:  " <<
e.code <<
>>>>>>>>>       
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>> XERCES_STD_QUALIFIER endl;
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>>       if (DOMImplementation::loadDOMExceptionMsg(e.code,
errText,
>>>>>>>>>       
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>> maxChars))
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>>            XERCES_STD_QUALIFIER cerr << "Message
is: " <<
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>> StrX(errText)
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>>>       
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>> << XERCES_STD_QUALIFIER endl;
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>>       errorsOccured = true;
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>>     catch (...)
>>>>>>>>>     {
>>>>>>>>>         XERCES_STD_QUALIFIER cerr << "An error
occurred during
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>> parsing\n
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>>>       
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>> " << XERCES_STD_QUALIFIER endl;
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>>         errorsOccured = true;
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>   
>>>>>>>>>       
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>   
>>>>>       
>>>>>           
>>>   
>>>       
>
>   


Mime
View raw message