xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Sharma <Gopal.Sha...@Sun.COM>
Subject RE: [Xerces2] Measuring performance and optimization
Date Mon, 06 May 2002 14:03:03 GMT

  Pankaj,
  
  Thanks for the link and suggestion.
  
  - Gopal


Pankaj Kumar wrote:-
> 
> Hi,
> 
> Few months ago I had written a program to measure Java XML parsing
> performance. May be it could be of some use here. You can find details at
> http://www.pankaj-k.net/xpb4j/
> 
> I am not aware of Xerces internals so whatever I say here may not make much
> sense but one area where I feel that optimization at parser level can
> improve performance in server based applications is use of same String
> objects across parse runs. Let me elaborate -- A server program that accepts
> XML documents with every request comes across instances of documents from a
> small subset of schema. These documents use the same element names,
> attrobutes and namespace URIs. If the same immutable String objects can be
> used for these then there could be significant saving in allocation and
> deallocations.
> 
> The problem is slightly complicated as the identification of repeating
> Strings must happen at a much lower level, before a String object is created
> of a lookup. What could do the job is perhpas some smart lookup during
> lexical analysis.
> 
> Regards,
> Pankaj Kumar
> Web Services Architect
> HP Middleware
> 
> -----Original Message-----
> From: Gopal Sharma
> To: general@xml.apache.org; xerces-j-user@xml.apache.org
> Sent: 5/5/02 7:18 AM
> Subject: [Xerces2] Measuring performance and optimization
> 
> 
>  FYI
>  
>  Hi,
>  
>  I have forwarded this mail to _YOU_ ( general and xerces-j-user ) in
> view 
>  that you might be using *Xerces 2* in one way or other and could
> provide 
>  some data/details/suggestions/comments which would help us in this
> effort.
>  
>  Thanks in advance for your valuable suggestion(s) and comment(s).
>  
>  - Gopal 
> 
> ------------- Begin Forwarded Message -------------
> Date: Fri, 3 May 2002 21:03:00 +0000 (Asia/Calcutta)
> From: Rahul Srivastava <Rahul.Srivastava@Sun.COM>
> Subject: [xerces2] Measuring performance and optimization
> To: xerces-j-dev@xml.apache.org
> 
> Hi folks,
> 
> It has been long talking about improving the performance of Xerces2.
> There has 
> been some benchmarking done earlier, for instance the one done by Dennis
> 
> Sosnoski, see: http://www.sosnoski.com/opensrc/xmlbench/index.html .
> These 
> results are important to know how fast/slow xerces is as compared to
> other 
> parsers. But, we need to identify areas of improvement in xerces. We
> need to 
> calculate the time taken by each individual component in the pipeline
> and figure 
> out which component swallows how much time for various events and then
> we can 
> actually concentrate on improving performance for those areas. So, here
> is what 
> we plan to do:
> 
> + sax parsing
>   - time taken
> + dom parsing
>   - dom construction time
>   - dom traversal time
>   - memory consumed
>   - considering the feature deferred-dom as true/false for all of above
> + DTD validation
>   - one time parse, time taken
>   - multiple times parse using same instance, time taken for second
> parse 
> onwards
> + Schema validation
>   - one time parse, time taken
>   - multiple times parse using same instance, time taken for second
> parse 
> onwards
> + optimising the pipeline
>   - calculate pipeline/component initialization time.
>   - calculating the time each component in the pipeline takes to
> propagate
>     the event.
>   - Using configurations to set up an optimised pipeline for various
> cases
>     such as novalidation, DTD validation only, etc. and calculate the 
>     time taken. 
> 
> Apart from this should we consider the existing grammar caching
> framework to 
> evaluate the performance of the parser?
> 
> We have classified the inputs to be used for this testing as follows:
> 
> + instance docs used
>   - tag centric (more tags and small content say 10-50 bytes)
>       Type      Tags#
>     -------------------
>     * small     5-50   
>     * medium    50-500
>     * large     >500  
>     
>   - content centric (less tags say 5-10 and huge content)
>       Type      content b/w a pair of tag
>     -------------------------------------
>     * small     500 kb
>     * medium    500-5000 kb
>     * large     >5000 kb
> 
> We can also have depth of the tags as a criteria for the above cases.
> 
> Actually speaking, there can be enormous combinations and different
> figures in 
> the above table that reflect the real word instance docs used. I would
> like to 
> know the view of the community here. Is this data enough to evaluate the
> 
> performance of the parser. Is there any data which is publicly available
> and can 
> be used for performance evaluation?.
> 
> + DTD's used
>   - should use different types of entities
>   
> + XMLSchema's used
>   - should use most of the elements and datatypes
>   
> Will it really help in any way?
> 
> Any comments or suggestions appreciated.
> 
> Thanks,
> Rahul.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
> 
> 
> ------------- End Forwarded Message -------------
> 
> 
> 
> ---------------------------------------------------------------------
> In case of troubles, e-mail:     webmaster@xml.apache.org
> To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org


---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Mime
View raw message