Return-Path: Delivered-To: apmail-xerces-c-users-archive@www.apache.org Received: (qmail 54777 invoked from network); 4 Sep 2009 13:18:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Sep 2009 13:18:08 -0000 Received: (qmail 41743 invoked by uid 500); 4 Sep 2009 13:18:07 -0000 Delivered-To: apmail-xerces-c-users-archive@xerces.apache.org Received: (qmail 41700 invoked by uid 500); 4 Sep 2009 13:18:07 -0000 Mailing-List: contact c-users-help@xerces.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: c-users@xerces.apache.org Delivered-To: mailing list c-users@xerces.apache.org Received: (qmail 41690 invoked by uid 99); 4 Sep 2009 13:18:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Sep 2009 13:18:07 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mirko.braun@gmx.de designates 213.165.64.20 as permitted sender) Received: from [213.165.64.20] (HELO mail.gmx.net) (213.165.64.20) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 04 Sep 2009 13:17:56 +0000 Received: (qmail 13644 invoked by uid 0); 4 Sep 2009 13:17:36 -0000 Received: from 213.30.210.183 by www008.gmx.net with HTTP; Fri, 04 Sep 2009 15:17:33 +0200 (CEST) Content-Type: text/plain; charset="us-ascii" Date: Fri, 04 Sep 2009 15:17:33 +0200 From: "Mirko Braun" In-Reply-To: <782A77DE52224A4293522E9999C6AFB330092D5DE0@VMBX101.ihostexchange.net> Message-ID: <20090904131733.251210@gmx.net> MIME-Version: 1.0 References: <20090903190647.194020@gmx.net> <4AA0EDD6.5090404@datadirect.com> <20090904120048.32620@gmx.net> <782A77DE52224A4293522E9999C6AFB330092D5DE0@VMBX101.ihostexchange.net> Subject: Re: RE: method startElement() from class DOMLSParserFilter To: c-users@xerces.apache.org X-Authenticated: #9012244 X-Flags: 0001 X-Mailer: WWW-Mail 6100 (Global Message Exchange) X-Priority: 3 X-Provags-ID: V01U2FsdGVkX19jWepc8O9jsTiUzmTvmEpbPzmafDrOZ0d+7BR0Dw Zh3QaxhPBC5ux6v96aX3HlJCtLo8kAEGU5pw== Content-Transfer-Encoding: 7bit X-GMX-UID: OlfrDWkFa2Aoe3CyvnQyRVo6OWhhaocJ X-FuHaFi: 0.47 X-Virus-Checked: Checked by ClamAV on apache.org Hi John, as far as i understand the explanation for the method startElement() in the API reference there are no childrens. "The element node passed to startElement for filtering will include all of the attributes, but none of the children nodes." As a consequence removing of children must be done by the parser internally. Is this correct? Best regards Mirko -------- Original-Nachricht -------- > Datum: Fri, 4 Sep 2009 08:11:14 -0400 > Von: John Lilley > An: "c-users@xerces.apache.org" > Betreff: RE: method startElement() from class DOMLSParserFilter > Forgive my ignorance, but could it be that you must reject not only the > node you don't want, but all of its children as well? > > john > > -----Original Message----- > From: Mirko Braun [mailto:mirko.braun@gmx.de] > Sent: Friday, September 04, 2009 6:01 AM > To: c-users@xerces.apache.org > Subject: Re: method startElement() from class DOMLSParserFilter > > > Hi Alberto, > > thank you for you answer. I integrated the changes you > suggested, but the result is still the same: > > DOM Error during parsing: > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > DOMException code is: 3 > Message is: attempt is made to insert a node where it is not permitted > > Best regards, > Mirko > > -------- Original-Nachricht -------- > > Datum: Fri, 04 Sep 2009 12:37:10 +0200 > > Von: Alberto Massari > > An: c-users@xerces.apache.org > > Betreff: Re: method startElement() from class DOMLSParserFilter > > > Hi Mirko, > > I think the current implementation of the DOMLSParserFilter doesn't work > > nicely with your code, as the rejected nodes are not recycled and the > > memory will grow to the same level as before. > > Anyhow, you should instead override acceptNode like this: > > > > DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement* > > node) > > { > > // for element whose name is "DATA", skip it > > if (node->getNodeType()==DOMNode::ELEMENT_NODE && > > XMLString::compareString(node->getNodeName(), element_data)==0) > > return DOMParserFilter::FILTER_REJECT; > > else > > return DOMParserFilter::FILTER_ACCEPT; > > } > > > > Then, change DOMLSParserImpl::endElement to add a call to > > origNode->release() after the call to removeChild(). > > > > Alberto > > > > > > Mirko Braun wrote: > > > Hello everybody, > > > > > > i would like to parse a quite large XML file (about 180 MB). > > > I used the DOM interface because i need the tree for further > > > processing of the data the xml file contains. Of course there > > > is a lot of memory used during parsing the file and i got an > > > "Out of memory" exception. > > > > > > I noticed that a class DOMLSParserFilter comes along wiht Xercesc C++ > > 3.0.1 (Win32), which makes it possible to filter the Nodes during > parsing. > > > That is perfect for me because one XML-Element in my large file > > > contains most of the data. This XML-Element is called DATA and > > > appears serveral time in my XML file. > > > So i had the idea to reject this XML-Element from the DOM tree > > > during parsing to reduce the used memory by using the method > > > startElement() of the DOMLSParserFilter class. After that i would > > > use a SAX parser and just get all XML-Elements DATA with their values. > > > But it does not work. > > > I integregated my code into the DOMPrint example which comes along > > > with Xercesc C++ 3.0.1. The following error message occurred: > > > > > > DOM Error during parsing: > > > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\MyXML.xml' > > > DOMException code is: 3 > > > Message is: attempt is made to insert a node where it is not permitted > > > > > > > > > Did i misunderstand the functionality of the DOMLSParserFilter class > > > and its method startElement? > > > It is possible to realize my idea with the help of this class? Did > > > i something wrong with in my code (please have a look below)? > > > > > > I would be very grateful for any help. > > > > > > Thanks in advanced, > > > Mirko > > > > > > > > > DOMPrintFilter.hpp: > > > -------------------- > > > > > > > > > class DOMParserFilter : public DOMLSParserFilter { > > > public: > > > > > > DOMParserFilter(DOMNodeFilter::ShowType whatToShow = > > DOMNodeFilter::SHOW_ALL); > > > ~DOMParserFilter(){}; > > > > > > virtual FilterAction startElement(DOMElement* node); > > > virtual FilterAction acceptNode(DOMNode* node){return > > DOMParserFilter::FILTER_ACCEPT;}; > > > virtual DOMNodeFilter::ShowType getWhatToShow() const {return > > fWhatToShow;}; > > > > > > private: > > > DOMNodeFilter::ShowType fWhatToShow; > > > }; > > > > > > > > > DOMPrintFilter.cpp: > > > -------------------- > > > > > > DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow) > > > :fWhatToShow(whatToShow) > > > {} > > > > > > DOMParserFilter::FilterAction > DOMParserFilter::startElement(DOMElement* > > node) > > > { > > > // for element whose name is "DATA", skip it > > > if (XMLString::compareString(node->getNodeName(), element_data)==0) > > > return DOMParserFilter::FILTER_REJECT; > > > else > > > return DOMParserFilter::FILTER_ACCEPT; > > > } > > > > > > > > > DOMPrint.cpp: > > > --------------- > > > > > > static const XMLCh gLS[] = { xercesc::chLatin_L, xercesc::chLatin_S, > > xercesc::chNull }; > > > > > > xercesc::DOMImplementation *implParser = > > xercesc::DOMImplementationRegistry::getDOMImplementation(gLS); > > > > > > xercesc::DOMLSParser* parser = > > > ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMImplementationLS::MODE_SYNCHRONOUS, 0); > > > > > > > > > > > > DOMTreeErrorReporter *errReporter = new DOMTreeErrorReporter(); > > > > parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler, > > errReporter); > > > > > > DOMParserFilter * pDOMParserFilter = new DOMParserFilter(); > > > parser->setFilter(pDOMParserFilter); > > > > > > > > > // > > > // Parse the XML file, catching any XML exceptions that might > > propogate > > > // out of it. > > > // > > > bool errorsOccured = false; > > > DOMDocument *doc = NULL; > > > > > > try > > > { > > > doc = parser->parseURI(gXmlFile); > > > } > > > catch (const OutOfMemoryException&) > > > { > > > XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << > > XERCES_STD_QUALIFIER endl; > > > errorsOccured = true; > > > } > > > catch (const XMLException& e) > > > { > > > XERCES_STD_QUALIFIER cerr << "An error occurred during > parsing\n > > Message: " > > > << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl; > > > errorsOccured = true; > > > } > > > > > > catch (const DOMException& e) > > > { > > > const unsigned int maxChars = 2047; > > > XMLCh errText[maxChars + 1]; > > > > > > XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" << > > gXmlFile << "'\n" > > > << "DOMException code is: " << e.code << > > XERCES_STD_QUALIFIER endl; > > > > > > if (DOMImplementation::loadDOMExceptionMsg(e.code, errText, > > maxChars)) > > > XERCES_STD_QUALIFIER cerr << "Message is: " << > StrX(errText) > > << XERCES_STD_QUALIFIER endl; > > > > > > errorsOccured = true; > > > } > > > > > > catch (...) > > > { > > > XERCES_STD_QUALIFIER cerr << "An error occurred during > parsing\n > > " << XERCES_STD_QUALIFIER endl; > > > errorsOccured = true; > > > } > > > > > > > > > > > > > > >