xerces-c-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mirko Braun" <mirko.br...@gmx.de>
Subject Re: method startElement() from class DOMLSParserFilter
Date Mon, 07 Sep 2009 08:09:24 GMT

Sorry, I don't know how much memory is used. I just had a look at the
maximum used memory in the task manager (Window XP). It doesn't
matter if i used a DOMLSParserFilter or not the process DOMPrint.exe used the same size of
memory.
The XML-Elements DATA which i want to reject have very large values
and i think if i reject these nodes they are also removed from
memory. Does "be marked for recycling" mean, that these DATA nodes
remain in memory?

Mirko

-------- Original-Nachricht --------
> Datum: Mon, 07 Sep 2009 09:26:05 +0200
> Von: Alberto Massari <amassari@datadirect.com>
> An: c-users@xerces.apache.org
> Betreff: Re: method startElement() from class DOMLSParserFilter

> Mirko Braun wrote:
> > Hi Alberto,
> >
> > thank you very much for your help. I integrated the patch in
> > 3.0.1 and it worked. There is no exception any more.
> > But there is still one problem. The usage of memory is still
> > of the same size. I think if a node is rejected from the tree
> > the usage of memory should also decrease. Is my conclusion
> > correct?
> >   
> 
> Yes, if a node is rejected is should be marked for recycling; how much 
> memory are you seeing is been used?
> 
> Alberto
> 
> > Mirko
> >
> > -------- Original-Nachricht --------
> >   
> >> Datum: Fri, 04 Sep 2009 16:12:16 +0200
> >> Von: Alberto Massari <amassari@datadirect.com>
> >> An: c-users@xerces.apache.org
> >> Betreff: Re: method startElement() from class DOMLSParserFilter
> >>     
> >
> >   
> >> In effect I am seeing so many problems with that code that the only 
> >> suggestion I have is to get the latest 3.0 from the trunk and work with
> >> what I have just committed (or get the patch from 
> >> http://svn.apache.org/viewvc?rev=811420&view=rev and apply to the 3.0.1
> >> code). This version should support your original code.
> >>
> >> Alberto
> >>
> >>
> >> Mirko Braun wrote:
> >>     
> >>> Hi Alberto,
> >>>
> >>> yes, i'm still using the method startElement(). Is it better
> >>> to use the method acceptNode() to reject the DATA node from
> >>> the DOM or is there any other possibility?
> >>>
> >>> Mirko
> >>>
> >>>
> >>> -------- Original-Nachricht --------
> >>>   
> >>>       
> >>>> Datum: Fri, 04 Sep 2009 15:41:54 +0200
> >>>> Von: Alberto Massari <amassari@datadirect.com>
> >>>> An: c-users@xerces.apache.org
> >>>> Betreff: Re: method startElement() from class DOMLSParserFilter
> >>>>     
> >>>>         
> >>>   
> >>>       
> >>>> Hi Mirko,
> >>>> are you still using startElement()? That API would mess with the
> >>>>         
> >> current 
> >>     
> >>>> parent, so it would break the parsing at a certain point.
> >>>>
> >>>> Alberto
> >>>>
> >>>> Mirko Braun wrote:
> >>>>     
> >>>>         
> >>>>> Hi Alberto,
> >>>>>
> >>>>> yes i'm sure that DATA is not a root node. I debugged a little bit.
> >>>>> The exception occurs after the sixth time this DATA node was found.
> >>>>>
> >>>>> Mirko
> >>>>>
> >>>>> -------- Original-Nachricht --------
> >>>>>   
> >>>>>       
> >>>>>           
> >>>>>> Datum: Fri, 04 Sep 2009 14:21:15 +0200
> >>>>>> Von: Alberto Massari <amassari@datadirect.com>
> >>>>>> An: c-users@xerces.apache.org
> >>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>   
> >>>>>       
> >>>>>           
> >>>>>> Hi Mirko,
> >>>>>> are you sure that your root node isn't one of those DATA elements?
> In
> >>>>>> this case the document node would see more than one root element.
> >>>>>>
> >>>>>> Alberto
> >>>>>>
> >>>>>> Mirko Braun wrote:
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>> Hi Alberto,
> >>>>>>>
> >>>>>>> thank you for you answer. I integrated the changes you
> >>>>>>> suggested, but the result is still the same:
> >>>>>>>
> >>>>>>> DOM Error during parsing:
> >>>>>>>
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>
> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
> >>     
> >>>>     
> >>>>         
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>> DOMException code is:  3
> >>>>>>> Message is: attempt is made to insert a node where it is
not
> >>>>>>>               
> >> permitted
> >>     
> >>>>>>> Best regards,
> >>>>>>> Mirko
> >>>>>>>
> >>>>>>> -------- Original-Nachricht --------
> >>>>>>>   
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>>>> Datum: Fri, 04 Sep 2009 12:37:10 +0200
> >>>>>>>> Von: Alberto Massari <amassari@datadirect.com>
> >>>>>>>> An: c-users@xerces.apache.org
> >>>>>>>> Betreff: Re: method startElement() from class DOMLSParserFilter
> >>>>>>>>     
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>>   
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>>>> Hi Mirko,
> >>>>>>>> I think the current implementation of the DOMLSParserFilter
> doesn't
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>> work 
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>>> nicely with your code, as the rejected nodes are not
recycled and
> >>>>>>>>                 
> >> the
> >>     
> >>>>>>>> memory will grow to the same level as before.
> >>>>>>>> Anyhow, you should instead override acceptNode like
this:
> >>>>>>>>
> >>>>>>>> DOMParserFilter::FilterAction
> >>>>>>>>                 
> >> DOMParserFilter::acceptNode(DOMElement*
> >>     
> >>>>>>>> node)
> >>>>>>>> {
> >>>>>>>>   // for element whose name is "DATA", skip it
> >>>>>>>>    if (node->getNodeType()==DOMNode::ELEMENT_NODE
&& 
> >>>>>>>> XMLString::compareString(node->getNodeName(), element_data)==0)
> >>>>>>>>      return DOMParserFilter::FILTER_REJECT;
> >>>>>>>>   else
> >>>>>>>>     return DOMParserFilter::FILTER_ACCEPT;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> Then, change DOMLSParserImpl::endElement to add a call
to 
> >>>>>>>> origNode->release() after the call to removeChild().
> >>>>>>>>
> >>>>>>>> Alberto
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Mirko Braun wrote:
> >>>>>>>>     
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>>>> Hello everybody,
> >>>>>>>>>
> >>>>>>>>> i would like to parse a quite large XML file (about
180 MB).
> >>>>>>>>> I used the DOM interface because i need the tree
for further
> >>>>>>>>> processing of the data the xml file contains. Of
course there
> >>>>>>>>> is a lot of memory used during parsing the file
and i got an
> >>>>>>>>> "Out of memory" exception. 
> >>>>>>>>>
> >>>>>>>>> I noticed that a class DOMLSParserFilter comes along
wiht
> Xercesc
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>> C++
> >>>>     
> >>>>         
> >>>>>>>>>       
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>>>>>> 3.0.1 (Win32), which makes it possible to filter the
Nodes during
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>> parsing.
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>>>     
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>>>> That is perfect for me because one XML-Element in
my large file
> >>>>>>>>> contains most of the data. This XML-Element is called
DATA and
> >>>>>>>>> appears serveral time in my XML file.
> >>>>>>>>> So i had the idea to reject this XML-Element from
the DOM tree
> >>>>>>>>> during parsing to reduce the used memory by using
the method
> >>>>>>>>> startElement() of the DOMLSParserFilter class. After
that i
> would
> >>>>>>>>> use a SAX parser and just get all XML-Elements DATA
with their
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>> values.
> >>>>     
> >>>>         
> >>>>>>>>> But it does not work.
> >>>>>>>>> I integregated my code into the DOMPrint example
which comes
> along
> >>>>>>>>> with Xercesc C++ 3.0.1. The following error message
occurred: 
> >>>>>>>>>
> >>>>>>>>> DOM Error during parsing:
> >>>>>>>>>       
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>
> 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml'
> >>     
> >>>>     
> >>>>         
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>>>     
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>>>> DOMException code is:  3
> >>>>>>>>> Message is: attempt is made to insert a node where
it is not
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>> permitted
> >>>>     
> >>>>         
> >>>>>>>>> Did i misunderstand the functionality of the DOMLSParserFilter
> >>>>>>>>>                   
> >> class
> >>     
> >>>>>>>>> and its method startElement?
> >>>>>>>>> It is possible to realize my idea with the help
of this class?
> Did
> >>>>>>>>> i something wrong with in my code (please have a
look below)?
> >>>>>>>>>
> >>>>>>>>> I would be very grateful for any help.
> >>>>>>>>>
> >>>>>>>>> Thanks in advanced,
> >>>>>>>>> Mirko
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> DOMPrintFilter.hpp:
> >>>>>>>>> --------------------
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> class DOMParserFilter : public DOMLSParserFilter
{
> >>>>>>>>> public:
> >>>>>>>>>
> >>>>>>>>>   DOMParserFilter(DOMNodeFilter::ShowType whatToShow
=
> >>>>>>>>>       
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>>>>>> DOMNodeFilter::SHOW_ALL);
> >>>>>>>>     
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>>>>     ~DOMParserFilter(){};
> >>>>>>>>>
> >>>>>>>>>     virtual FilterAction startElement(DOMElement*
node);
> >>>>>>>>>     virtual FilterAction acceptNode(DOMNode* node){return
> >>>>>>>>>       
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>>>>>> DOMParserFilter::FILTER_ACCEPT;};
> >>>>>>>>     
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>>>>     virtual DOMNodeFilter::ShowType getWhatToShow()
const
> {return
> >>>>>>>>>       
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>>>>>> fWhatToShow;};
> >>>>>>>>     
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>>>> private:
> >>>>>>>>>     DOMNodeFilter::ShowType fWhatToShow;
> >>>>>>>>> };
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> DOMPrintFilter.cpp:
> >>>>>>>>> --------------------
> >>>>>>>>>
> >>>>>>>>> DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType
> >>>>>>>>>                   
> >> whatToShow)
> >>     
> >>>>>>>>> :fWhatToShow(whatToShow)
> >>>>>>>>> {}
> >>>>>>>>>
> >>>>>>>>> DOMParserFilter::FilterAction
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>>>> DOMParserFilter::startElement(DOMElement*
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>>>>       
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>>>>>> node)
> >>>>>>>>     
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>>>> {
> >>>>>>>>>   // for element whose name is "DATA", skip it
> >>>>>>>>>   if (XMLString::compareString(node->getNodeName(),
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>> element_data)==0)
> >>>>     
> >>>>         
> >>>>>>>>>     return DOMParserFilter::FILTER_REJECT;
> >>>>>>>>>   else
> >>>>>>>>>     return DOMParserFilter::FILTER_ACCEPT;
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> DOMPrint.cpp:
> >>>>>>>>> ---------------
> >>>>>>>>>
> >>>>>>>>> static const XMLCh gLS[] = { xercesc::chLatin_L,
> >>>>>>>>>                   
> >> xercesc::chLatin_S,
> >>     
> >>>>>>>>>       
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>>>>>> xercesc::chNull };
> >>>>>>>>     
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>>>> xercesc::DOMImplementation *implParser =
> >>>>>>>>>       
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>>>>>> xercesc::DOMImplementationRegistry::getDOMImplementation(gLS);
> >>>>>>>>     
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>>>> xercesc::DOMLSParser* parser =
> >>>>>>>>>       
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>
> ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS,
0);
> >>     
> >>>>     
> >>>>         
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>>>     
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>>>> DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter();
> >>>>>>>>>
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>
> parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler,
> >>     
> >>>>     
> >>>>         
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>>>>       
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>>>>>> errReporter);
> >>>>>>>>     
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>>>>     
> >>>>>>>>> DOMParserFilter * pDOMParserFilter = new DOMParserFilter();
> >>>>>>>>> parser->setFilter(pDOMParserFilter);
> >>>>>>>>>     
> >>>>>>>>>
> >>>>>>>>>     //
> >>>>>>>>>     //  Parse the XML file, catching any XML exceptions
that
> might
> >>>>>>>>>       
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>>>>>> propogate
> >>>>>>>>     
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>>>>     //  out of it.
> >>>>>>>>>     //
> >>>>>>>>>     bool errorsOccured = false;
> >>>>>>>>>     DOMDocument *doc = NULL;
> >>>>>>>>>
> >>>>>>>>>     try
> >>>>>>>>>     {
> >>>>>>>>>       doc = parser->parseURI(gXmlFile);
> >>>>>>>>>     }
> >>>>>>>>>     catch (const OutOfMemoryException&)
> >>>>>>>>>     {
> >>>>>>>>>         XERCES_STD_QUALIFIER cerr << "OutOfMemoryException"
<<
> >>>>>>>>>       
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>>>>>> XERCES_STD_QUALIFIER endl;
> >>>>>>>>     
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>>>>         errorsOccured = true;
> >>>>>>>>>     }
> >>>>>>>>>     catch (const XMLException& e)
> >>>>>>>>>     {
> >>>>>>>>>         XERCES_STD_QUALIFIER cerr << "An error
occurred during
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>>>> parsing\n
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>>>>       
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>>>>>>   Message: "
> >>>>>>>>     
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>>>>              << StrX(e.getMessage()) <<
XERCES_STD_QUALIFIER
> endl;
> >>>>>>>>>         errorsOccured = true;
> >>>>>>>>>     }
> >>>>>>>>>
> >>>>>>>>>     catch (const DOMException& e)
> >>>>>>>>>     {
> >>>>>>>>>       const unsigned int maxChars = 2047;
> >>>>>>>>>       XMLCh errText[maxChars + 1];
> >>>>>>>>>
> >>>>>>>>>       XERCES_STD_QUALIFIER cerr << "\nDOM
Error during parsing:
> '"
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>> <<
> >>>>     
> >>>>         
> >>>>>>>>>       
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>>>>>> gXmlFile << "'\n"
> >>>>>>>>     
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>>>>            << "DOMException code is:  " <<
e.code <<
> >>>>>>>>>       
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>>>>>> XERCES_STD_QUALIFIER endl;
> >>>>>>>>     
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>>>>       if (DOMImplementation::loadDOMExceptionMsg(e.code,
> errText,
> >>>>>>>>>       
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>>>>>> maxChars))
> >>>>>>>>     
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>>>>            XERCES_STD_QUALIFIER cerr << "Message
is: " <<
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>>>> StrX(errText)
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>>>>       
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>>>>>> << XERCES_STD_QUALIFIER endl;
> >>>>>>>>     
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>>>>       errorsOccured = true;
> >>>>>>>>>     }
> >>>>>>>>>
> >>>>>>>>>     catch (...)
> >>>>>>>>>     {
> >>>>>>>>>         XERCES_STD_QUALIFIER cerr << "An error
occurred during
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>>>> parsing\n
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>>>>       
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>>>>>> " << XERCES_STD_QUALIFIER endl;
> >>>>>>>>     
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>>>>         errorsOccured = true;
> >>>>>>>>>     }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>   
> >>>>>>>>>       
> >>>>>>>>>           
> >>>>>>>>>               
> >>>>>>>>>                   
> >>>>>>>   
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>   
> >>>>>       
> >>>>>           
> >>>   
> >>>       
> >
> >   

Mime
View raw message