xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neil Graham <ne...@ca.ibm.com>
Subject Re: XML performance problems with xerces c++
Date Tue, 25 May 2004 15:21:32 GMT





Hi Nath,

You really don't want to use this list for such questions; better to use
the Xerces-C-specific list here [*].

But here are some thoughts:  I don't understand what you mean when you
write "It seems the larger the XML file, the longer it takes to parse
individual nodes."  When Xerces returns a DOM document to you, it has
already parsed the entire document; it doesn't go off and parse more of it
as you move down the list of children of the root element.  And, if all you
want is information from the children of the root element, you may well
wish to use SAX; the DOM is inherently both processor- and
memory-intensive.

Cheers,
Neil

[*]:  http://xml.apache.org/mail.html#xerces-c-dev
Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  neilg@ca.ibm.com




                                                                                         
                                             
                      "Nath"                                                             
                                             
                      <nath_meyer@hotma        To:       <general@xml.apache.org>
                                                     
                      il.com>                  cc:                                    
                                                
                                               Subject:  XML performance problems with xerces
c++                                      
                      05/24/2004 10:56                                                   
                                             
                      PM                                                                 
                                             
                      Please respond to                                                  
                                             
                      general                                                            
                                             
                                                                                         
                                             
                                                                                         
                                             



I converted over a dictionary of words and definitions into XML files (one
file per letter of the alphabet), each weighing around 1-5 megs (I chose
XML
over a DB for important reasons). I'm trying to parse these files and it's
taking an incredible amount of time to do it. When parsing small files
(letters X, Y, and Z - a total of 815 words or 151 KB) the parser can do so
in less than 2 seconds. When parsing the letter A file (40,000 some words
or
1.58 megs), it takes 5 seconds just to parse 20 words. It seems the larger
the XML file, the longer it takes to parse individual nodes. Can anyone
suggest why this is happening and how I can fix it? I've used xerces c++
2.4.0 and recently upgraded to xerces 2.5.0.



I'm just following the standard XML start-up and DOM parsing procedure

- Initialize platform utils

- Don't validate files

- parse and assign DOM document

- go through each child node and collect data



I have a 1600MHz processor, so handling a few meg files should be fairly
quick.

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Mime
View raw message