Return-Path: Delivered-To: apmail-xerces-c-users-archive@www.apache.org Received: (qmail 21560 invoked from network); 4 Sep 2009 12:11:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Sep 2009 12:11:49 -0000 Received: (qmail 51848 invoked by uid 500); 4 Sep 2009 12:11:49 -0000 Delivered-To: apmail-xerces-c-users-archive@xerces.apache.org Received: (qmail 51817 invoked by uid 500); 4 Sep 2009 12:11:48 -0000 Mailing-List: contact c-users-help@xerces.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: c-users@xerces.apache.org Delivered-To: mailing list c-users@xerces.apache.org Received: (qmail 51801 invoked by uid 99); 4 Sep 2009 12:11:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Sep 2009 12:11:48 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [66.46.182.54] (HELO relay.ihostexchange.net) (66.46.182.54) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Sep 2009 12:11:39 +0000 Received: from VMBX101.ihostexchange.net ([192.168.3.1]) by HUB104.ihostexchange.net ([66.46.182.54]) with mapi; Fri, 4 Sep 2009 08:11:18 -0400 From: John Lilley To: "c-users@xerces.apache.org" Date: Fri, 4 Sep 2009 08:11:14 -0400 Subject: RE: method startElement() from class DOMLSParserFilter Thread-Topic: method startElement() from class DOMLSParserFilter Thread-Index: AcotV4JPZFQoSsPuRLieG0a5TeMyIQAAThrw Message-ID: <782A77DE52224A4293522E9999C6AFB330092D5DE0@VMBX101.ihostexchange.net> References: <20090903190647.194020@gmx.net> <4AA0EDD6.5090404@datadirect.com> <20090904120048.32620@gmx.net> In-Reply-To: <20090904120048.32620@gmx.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Forgive my ignorance, but could it be that you must reject not only the nod= e you don't want, but all of its children as well? john -----Original Message----- From: Mirko Braun [mailto:mirko.braun@gmx.de]=20 Sent: Friday, September 04, 2009 6:01 AM To: c-users@xerces.apache.org Subject: Re: method startElement() from class DOMLSParserFilter Hi Alberto, thank you for you answer. I integrated the changes you suggested, but the result is still the same: DOM Error during parsing: 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debug\= MyXML.xml' DOMException code is: 3 Message is: attempt is made to insert a node where it is not permitted Best regards, Mirko -------- Original-Nachricht -------- > Datum: Fri, 04 Sep 2009 12:37:10 +0200 > Von: Alberto Massari > An: c-users@xerces.apache.org > Betreff: Re: method startElement() from class DOMLSParserFilter > Hi Mirko, > I think the current implementation of the DOMLSParserFilter doesn't work= =20 > nicely with your code, as the rejected nodes are not recycled and the=20 > memory will grow to the same level as before. > Anyhow, you should instead override acceptNode like this: >=20 > DOMParserFilter::FilterAction DOMParserFilter::acceptNode(DOMElement* > node) > { > // for element whose name is "DATA", skip it > if (node->getNodeType()=3D=3DDOMNode::ELEMENT_NODE &&=20 > XMLString::compareString(node->getNodeName(), element_data)=3D=3D0) > return DOMParserFilter::FILTER_REJECT; > else > return DOMParserFilter::FILTER_ACCEPT; > } >=20 > Then, change DOMLSParserImpl::endElement to add a call to=20 > origNode->release() after the call to removeChild(). >=20 > Alberto >=20 >=20 > Mirko Braun wrote: > > Hello everybody, > > > > i would like to parse a quite large XML file (about 180 MB). > > I used the DOM interface because i need the tree for further > > processing of the data the xml file contains. Of course there > > is a lot of memory used during parsing the file and i got an > > "Out of memory" exception.=20 > > > > I noticed that a class DOMLSParserFilter comes along wiht Xercesc C++ > 3.0.1 (Win32), which makes it possible to filter the Nodes during parsing= . > > That is perfect for me because one XML-Element in my large file > > contains most of the data. This XML-Element is called DATA and > > appears serveral time in my XML file. > > So i had the idea to reject this XML-Element from the DOM tree > > during parsing to reduce the used memory by using the method > > startElement() of the DOMLSParserFilter class. After that i would > > use a SAX parser and just get all XML-Elements DATA with their values. > > But it does not work. > > I integregated my code into the DOMPrint example which comes along > > with Xercesc C++ 3.0.1. The following error message occurred:=20 > > > > DOM Error during parsing: > 'C:\Daten\2009-08-07_NewXercesc\3_0_1\xerces-c-3.0.1\Build\Win32\VC6\Debu= g\MyXML.xml' > > DOMException code is: 3 > > Message is: attempt is made to insert a node where it is not permitted > > > > > > Did i misunderstand the functionality of the DOMLSParserFilter class > > and its method startElement? > > It is possible to realize my idea with the help of this class? Did > > i something wrong with in my code (please have a look below)? > > > > I would be very grateful for any help. > > > > Thanks in advanced, > > Mirko > > > > > > DOMPrintFilter.hpp: > > -------------------- > > > > > > class DOMParserFilter : public DOMLSParserFilter { > > public: > > > > DOMParserFilter(DOMNodeFilter::ShowType whatToShow =3D > DOMNodeFilter::SHOW_ALL); > > ~DOMParserFilter(){}; > > > > virtual FilterAction startElement(DOMElement* node); > > virtual FilterAction acceptNode(DOMNode* node){return > DOMParserFilter::FILTER_ACCEPT;}; > > virtual DOMNodeFilter::ShowType getWhatToShow() const {return > fWhatToShow;}; > > > > private: > > DOMNodeFilter::ShowType fWhatToShow; > > }; > > > > > > DOMPrintFilter.cpp: > > -------------------- > > > > DOMParserFilter::DOMParserFilter(DOMNodeFilter::ShowType whatToShow) > > :fWhatToShow(whatToShow) > > {} > > > > DOMParserFilter::FilterAction DOMParserFilter::startElement(DOMElement* > node) > > { > > // for element whose name is "DATA", skip it > > if (XMLString::compareString(node->getNodeName(), element_data)=3D=3D= 0) > > return DOMParserFilter::FILTER_REJECT; > > else > > return DOMParserFilter::FILTER_ACCEPT; > > } > > > > > > DOMPrint.cpp: > > --------------- > > > > static const XMLCh gLS[] =3D { xercesc::chLatin_L, xercesc::chLatin_S, > xercesc::chNull }; > > > > xercesc::DOMImplementation *implParser =3D > xercesc::DOMImplementationRegistry::getDOMImplementation(gLS); > > > > xercesc::DOMLSParser* parser =3D > ((xercesc::DOMImplementationLS*)implParser)->createLSParser(xercesc::DOMI= mplementationLS::MODE_SYNCHRONOUS, 0); > > > > > > > > DOMTreeErrorReporter *errReporter =3D new DOMTreeErrorReporter(); > > parser->getDomConfig()->setParameter(xercesc::XMLUni::fgDOMErrorHandler= , > errReporter); > > =20 > > DOMParserFilter * pDOMParserFilter =3D new DOMParserFilter(); > > parser->setFilter(pDOMParserFilter); > > =20 > > > > // > > // Parse the XML file, catching any XML exceptions that might > propogate > > // out of it. > > // > > bool errorsOccured =3D false; > > DOMDocument *doc =3D NULL; > > > > try > > { > > doc =3D parser->parseURI(gXmlFile); > > } > > catch (const OutOfMemoryException&) > > { > > XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << > XERCES_STD_QUALIFIER endl; > > errorsOccured =3D true; > > } > > catch (const XMLException& e) > > { > > XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\= n > Message: " > > << StrX(e.getMessage()) << XERCES_STD_QUALIFIER endl; > > errorsOccured =3D true; > > } > > > > catch (const DOMException& e) > > { > > const unsigned int maxChars =3D 2047; > > XMLCh errText[maxChars + 1]; > > > > XERCES_STD_QUALIFIER cerr << "\nDOM Error during parsing: '" << > gXmlFile << "'\n" > > << "DOMException code is: " << e.code << > XERCES_STD_QUALIFIER endl; > > > > if (DOMImplementation::loadDOMExceptionMsg(e.code, errText, > maxChars)) > > XERCES_STD_QUALIFIER cerr << "Message is: " << StrX(errText) > << XERCES_STD_QUALIFIER endl; > > > > errorsOccured =3D true; > > } > > > > catch (...) > > { > > XERCES_STD_QUALIFIER cerr << "An error occurred during parsing\= n > " << XERCES_STD_QUALIFIER endl; > > errorsOccured =3D true; > > } > > > > > > > > > > =20