xerces-c-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mirko Braun" <mirko.br...@gmx.de>
Subject Re: method startElement() from class DOMLSParserFilter
Date Sun, 06 Sep 2009 16:01:41 GMT
Hi Alberto,

thank you very much for your help. I integrated the patch in
3.0.1 and it worked. There is no exception any more.
But there is still one problem. The usage of memory is still
of the same size. I think if a node is rejected from the tree
the usage of memory should also decrease. Is my conclusion
correct?

Mirko

-------- Original-Nachricht --------
> Datum: Fri, 04 Sep 2009 16:12:16 +0200
> Von: Alberto Massari <amassari@datadirect.com>
> An: c-users@xerces.apache.org
> Betreff: Re: method startElement() from class DOMLSParserFilter

> In effect I am seeing so many problems with that code that the only 
> suggestion I have is to get the latest 3.0 from the trunk and work with 
> what I have just committed (or get the patch from 
> http://svn.apache.org/viewvc?rev=811420&view=rev and apply to the 3.0.1 
> code). This version should support your original code.
> 
> Alberto
> 
> 
> Mirko Braun wrote:
> > Hi Alberto,
> >
> > yes, i'm still using the method startElement(). Is it better
> > to use the method acceptNode() to reject the DATA node from
> > the DOM or is there any other possibility?
> >
> > Mirko
> >
> >
> > -------- Original-Nachricht --------
> >   
> >> Datum: Fri, 04 Sep 2009 15:41:54 +0200
> >> Von: Alberto Massari <amassari@datadirect.com>
> >> An: c-users@xerces.apache.org
> >> Betreff: Re: method startElement() from class DOMLSParserFilter
> >>     
> >
> >   
> >> Hi Mirko,
> >> are you still using startElement()? That API would mess with the
> current 
> >> parent, so it would break the parsing at a certain point.
> >>
> >> Alberto
> >>
> >> Mirko Braun wrote:
> >>     
> >>> Hi Alberto,
> >>>
> >>> yes i'm sure that DATA is not a root node. I debugged a little bit.
> >>> The exception occurs after the sixth time this DATA node was found.
> >>>
> >>> Mirko
> >>>
> >>> -------- Original-Nachricht --------
> >>>   
> >>>       
> >>>> Datum: Fri, 04 Sep 2009 14:21:15 +0200
> >>>> Von: Alberto Massari <amassari@datadirect.com>
> >>>> An: c-users@xerces.apache.org
> >>>> Betreff: Re: method startElement() from class DOMLSParserFilter
> >>>>     
> >>>>         
> >>>   
> >>>       
> >>>> Hi Mirko,
> >>>> are you sure that your root node isn't one of those DATA elements? In
> >>>> this case the document node would see more than one root element.
> >>>>
> >>>> Alberto
> >>>>
> >>>> Mirko Braun wrote:
> >>>>     
> >>>>         
> >>>>> Hi Alberto,
> >>>>>
> >>>>> thank you for you answer. I integrated the changes you
> >>>>> suggested, but the result is still the same:
> >>>>>
> >>>>> DOM Error during parsing:
> >>>>>
> >>>>>       
> >>>>>           
> >>
> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
> >>     
> >>>>     
> >>>>         
> >>>>> DOMException code is:  3
> >>>>> Message is: attempt is made to insert a node where it is not
> permitted
> >>>>>
> >>>>> Best regards,
> >>>>> Mirko
> >>>>>
> >>>>> -------- Original-Nachricht --------
> >>>>>   
> >>>>>       
> >>>>>           
> >>>>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200
> >>>>>> Von: Alberto Massari <amassari@datadirect.com>
> >>>>>> An: c-users@xerces.apache.org
> >>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>   
> >>>>>       
> >>>>>           
> >>>>>> Hi Mirko,
> >>>>>> I think the current implementation of the DOMLSParserFilter
doesn't
> >>>>>>         
> >>>>>>             
> >>>> work 
> >>>>     
> >>>>         
> >>>>>> nicely with your code, as the rejected nodes are not recycled
and
> the
> >>>>>> memory will grow to the same level as before.
> >>>>>> Anyhow, you should instead override acceptNode like this:
> >>>>>>
> >>>>>> DOMParserFilter::FilterAction
> DOMParserFilter::acceptNode(DOMElement*
> >>>>>> node)
> >>>>>> {
> >>>>>>   // for element whose name is "DATA", skip it
> >>>>>>    if (node->getNodeType()==DOMNode::ELEMENT_NODE &&

> >>>>>> XMLString::compareString(node->getNodeName(), element_data)==0)
> >>>>>>      return DOMParserFilter::FILTER_REJECT;
> >>>>>>   else
> >>>>>>     return DOMParserFilter::FILTER_ACCEPT;
> >>>>>> }
> >>>>>>
> >>>>>> Then, change DOMLSParserImpl::endElement to add a call to 
> >>>>>> origNode->release() after the call to removeChild().
> >>>>>>
> >>>>>> Alberto
> >>>>>>
> >>>>>>
> >>>>>> Mirko Braun wrote:
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>> Hello everybody,
> >>>>>>>
> >>>>>>> i would like to parse a quite large XML file (about 180
MB).
> >>>>>>> I used the DOM interface because i need the tree for further
> >>>>>>> processing of the data the xml file contains. Of course
there
> >>>>>>> is a lot of memory used during parsing the file and i got
an
> >>>>>>> "Out of memory" exception. 
> >>>>>>>
> >>>>>>> I noticed that a class DOMLSParserFilter comes along wiht
Xercesc
> >>>>>>>               
> >> C++
> >>     
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>> 3.0.1 (Win32), which makes it possible to filter the Nodes during
> >>>>>>         
> >>>>>>             
> >>>> parsing.
> >>>>     
> >>>>         
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>> That is perfect for me because one XML-Element in my large
file
> >>>>>>> contains most of the data. This XML-Element is called DATA
and
> >>>>>>> appears serveral time in my XML file.
> >>>>>>> So i had the idea to reject this XML-Element from the DOM
tree
> >>>>>>> during parsing to reduce the used memory by using the method
> >>>>>>> startElement() of the DOMLSParserFilter class. After that
i would
> >>>>>>> use a SAX parser and just get all XML-Elements DATA with
their
> >>>>>>>               
> >> values.
> >>     
> >>>>>>> But it does not work.
> >>>>>>> I integregated my code into the DOMPrint example which comes
along
> >>>>>>> with Xercesc C++ 3.0.1. The following error message occurred:

> >>>>>>>
> >>>>>>> DOM Error during parsing:
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>
> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
> >>     
> >>>>     
> >>>>         
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>> DOMException code is:  3
> >>>>>>> Message is: attempt is made to insert a node where it is
not
> >>>>>>>               
> >> permitted
> >>     
> >>>>>>> Did i misunderstand the functionality of the DOMLSParserFilter
> class
> >>>>>>> and its method startElement?
> >>>>>>> It is possible to realize my idea with the help of this
class? Did
> >>>>>>> i something wrong with in my code (please have a look below)?
> >>>>>>>
> >>>>>>> I would be very grateful for any help.
> >>>>>>>
> >>>>>>> Thanks in advanced,
> >>>>>>> Mirko
> >>>>>>>
> >>>>>>>
> >>>>>>> DOMPrintFilter.hpp:
> >>>>>>> --------------------
> >>>>>>>
> >>>>>>>
> >>>>>>> class DOMParserFilter : public DOMLSParserFilter {
> >>>>>>> public:
> >>>>>>>
> >>>>>>>   DOMParserFilter(DOMNodeFilter::ShowType whatToShow =
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>> DOMNodeFilter::SHOW_ALL);
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>>     ~DOMParserFilter(){};
> >>>>>>>
> >>>>>>>     virtual FilterAction startElement(DOMElement* node);
> >>>>>>>     virtual FilterAction acceptNode(DOMNode* node){return
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>> DOMParserFilter::FILTER_ACCEPT;};
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>>     virtual DOMNodeFilter::ShowType getWhatToShow() const
{return
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>> fWhatToShow;};
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>> private:
> >>>>>>>     DOMNodeFilter::ShowType fWhatToShow;
> >>>>>>> };
> >>>>>>>
> >>>>>>>
> >>>>>>> DOMPrintFilter.cpp:
> >>>>>>> --------------------
> >>>>>>>
> >>>>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType
> whatToShow)
> >>>>>>> :fWhatToShow(whatToShow)
> >>>>>>> {}
> >>>>>>>
> >>>>>>> DOMParserFilter::FilterAction
> >>>>>>>           
> >>>>>>>               
> >>>> DOMParserFilter::startElement(DOMElement*
> >>>>     
> >>>>         
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>> node)
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>> {
> >>>>>>>   // for element whose name is "DATA", skip it
> >>>>>>>   if (XMLString::compareString(node->getNodeName(),
> >>>>>>>               
> >> element_data)==0)
> >>     
> >>>>>>>     return DOMParserFilter::FILTER_REJECT;
> >>>>>>>   else
> >>>>>>>     return DOMParserFilter::FILTER_ACCEPT;
> >>>>>>> }
> >>>>>>>
> >>>>>>>
> >>>>>>> DOMPrint.cpp:
> >>>>>>> ---------------
> >>>>>>>
> >>>>>>> static const XMLCh gLS[] = { xercesc::chLatin_L,
> xercesc::chLatin_S,
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>> xercesc::chNull };
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>> xercesc::DOMImplementation *implParser =
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS);
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>> xercesc::DOMLSParser* parser =
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>
> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS,
0);
> >>     
> >>>>     
> >>>>         
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
> >>>>>>>
> >>>>>>>           
> >>>>>>>               
> >>
> parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler,
> >>     
> >>>>     
> >>>>         
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>> errReporter);
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>>     
> >>>>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter();
> >>>>>>> parser->setFilter(pDOMParserFilter);
> >>>>>>>     
> >>>>>>>
> >>>>>>>     //
> >>>>>>>     //  Parse the XML file, catching any XML exceptions
that might
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>> propogate
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>>     //  out of it.
> >>>>>>>     //
> >>>>>>>     bool errorsOccured = false;
> >>>>>>>     DOMDocument *doc = NULL;
> >>>>>>>
> >>>>>>>     try
> >>>>>>>     {
> >>>>>>>       doc = parser->parseURI(gXmlFile);
> >>>>>>>     }
> >>>>>>>     catch (const OutOfMemoryException&)
> >>>>>>>     {
> >>>>>>>         XERCES_STD_QUALIFIER cerr << "OutOfMemoryException"
<<
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>> XERCES_STD_QUALIFIER endl;
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>>         errorsOccured = true;
> >>>>>>>     }
> >>>>>>>     catch (const XMLException& e)
> >>>>>>>     {
> >>>>>>>         XERCES_STD_QUALIFIER cerr << "An error occurred
during
> >>>>>>>           
> >>>>>>>               
> >>>> parsing\n
> >>>>     
> >>>>         
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>>   Message: "
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>>              << StrX(e.getMessage()) << XERCES_STD_QUALIFIER
endl;
> >>>>>>>         errorsOccured = true;
> >>>>>>>     }
> >>>>>>>
> >>>>>>>     catch (const DOMException& e)
> >>>>>>>     {
> >>>>>>>       const unsigned int maxChars = 2047;
> >>>>>>>       XMLCh errText[maxChars + 1];
> >>>>>>>
> >>>>>>>       XERCES_STD_QUALIFIER cerr << "\nDOM Error during
parsing: '"
> >>>>>>>               
> >> <<
> >>     
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>> gXmlFile << "'\n"
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>>            << "DOMException code is:  " << e.code
<<
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>> XERCES_STD_QUALIFIER endl;
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>>       if (DOMImplementation::loadDOMExceptionMsg(e.code,
errText,
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>> maxChars))
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>>            XERCES_STD_QUALIFIER cerr << "Message is:
" <<
> >>>>>>>           
> >>>>>>>               
> >>>> StrX(errText)
> >>>>     
> >>>>         
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>> << XERCES_STD_QUALIFIER endl;
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>>       errorsOccured = true;
> >>>>>>>     }
> >>>>>>>
> >>>>>>>     catch (...)
> >>>>>>>     {
> >>>>>>>         XERCES_STD_QUALIFIER cerr << "An error occurred
during
> >>>>>>>           
> >>>>>>>               
> >>>> parsing\n
> >>>>     
> >>>>         
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>> " << XERCES_STD_QUALIFIER endl;
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>>         errorsOccured = true;
> >>>>>>>     }
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>   
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>   
> >>>>>       
> >>>>>           
> >>>   
> >>>       
> >
> >   

Mime
View raw message