xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aleksander Slominski <as...@cs.indiana.edu>
Subject Re: [Xerces2] Measuring performance and optimization
Date Mon, 06 May 2002 05:36:19 GMT
hi,

most of recent parsers will be doing something like that (including Xerces)
basically by looking up interned String object from cache based on input
characters. i have tested influence of it on perfromance and typically it is
about 3-10% difference in performance (depending how much of input
are tags with repeatable element or attribute names - you can see such
results for XPP3: look for "Mostly Tags" and "Mostly Text" documents
and XPP3 parser configurations "MXP1 beta1 w/NS" and
"MXP1 beta1 w/NS no-string-caching" at:
http://www.extreme.indiana.edu/~aslom/xpp_sax2bench/results.html).

thanks,

alek

"KUMAR,PANKAJ (HP-Cupertino,ex1)" wrote:

> Hi,
>
> Few months ago I had written a program to measure Java XML parsing
> performance. May be it could be of some use here. You can find details at
> http://www.pankaj-k.net/xpb4j/
>
> I am not aware of Xerces internals so whatever I say here may not make much
> sense but one area where I feel that optimization at parser level can
> improve performance in server based applications is use of same String
> objects across parse runs. Let me elaborate -- A server program that accepts
> XML documents with every request comes across instances of documents from a
> small subset of schema. These documents use the same element names,
> attrobutes and namespace URIs. If the same immutable String objects can be
> used for these then there could be significant saving in allocation and
> deallocations.
>
> The problem is slightly complicated as the identification of repeating
> Strings must happen at a much lower level, before a String object is created
> of a lookup. What could do the job is perhpas some smart lookup during
> lexical analysis.
>
> Regards,
> Pankaj Kumar
> Web Services Architect
> HP Middleware
>
> -----Original Message-----
> From: Gopal Sharma
> To: general@xml.apache.org; xerces-j-user@xml.apache.org
> Sent: 5/5/02 7:18 AM
> Subject: [Xerces2] Measuring performance and optimization
>
>  FYI
>
>  Hi,
>
>  I have forwarded this mail to _YOU_ ( general and xerces-j-user ) in
> view
>  that you might be using *Xerces 2* in one way or other and could
> provide
>  some data/details/suggestions/comments which would help us in this
> effort.
>
>  Thanks in advance for your valuable suggestion(s) and comment(s).
>
>  - Gopal
>
> ------------- Begin Forwarded Message -------------
> Date: Fri, 3 May 2002 21:03:00 +0000 (Asia/Calcutta)
> From: Rahul Srivastava <Rahul.Srivastava@Sun.COM>
> Subject: [xerces2] Measuring performance and optimization
> To: xerces-j-dev@xml.apache.org
>
> Hi folks,
>
> It has been long talking about improving the performance of Xerces2.
> There has
> been some benchmarking done earlier, for instance the one done by Dennis
>
> Sosnoski, see: http://www.sosnoski.com/opensrc/xmlbench/index.html .
> These
> results are important to know how fast/slow xerces is as compared to
> other
> parsers. But, we need to identify areas of improvement in xerces. We
> need to
> calculate the time taken by each individual component in the pipeline
> and figure
> out which component swallows how much time for various events and then
> we can
> actually concentrate on improving performance for those areas. So, here
> is what
> we plan to do:
>
> + sax parsing
>   - time taken
> + dom parsing
>   - dom construction time
>   - dom traversal time
>   - memory consumed
>   - considering the feature deferred-dom as true/false for all of above
> + DTD validation
>   - one time parse, time taken
>   - multiple times parse using same instance, time taken for second
> parse
> onwards
> + Schema validation
>   - one time parse, time taken
>   - multiple times parse using same instance, time taken for second
> parse
> onwards
> + optimising the pipeline
>   - calculate pipeline/component initialization time.
>   - calculating the time each component in the pipeline takes to
> propagate
>     the event.
>   - Using configurations to set up an optimised pipeline for various
> cases
>     such as novalidation, DTD validation only, etc. and calculate the
>     time taken.
>
> Apart from this should we consider the existing grammar caching
> framework to
> evaluate the performance of the parser?
>
> We have classified the inputs to be used for this testing as follows:
>
> + instance docs used
>   - tag centric (more tags and small content say 10-50 bytes)
>       Type      Tags#
>     -------------------
>     * small     5-50
>     * medium    50-500
>     * large     >500
>
>   - content centric (less tags say 5-10 and huge content)
>       Type      content b/w a pair of tag
>     -------------------------------------
>     * small     500 kb
>     * medium    500-5000 kb
>     * large     >5000 kb
>
> We can also have depth of the tags as a criteria for the above cases.
>
> Actually speaking, there can be enormous combinations and different
> figures in
> the above table that reflect the real word instance docs used. I would
> like to
> know the view of the community here. Is this data enough to evaluate the
>
> performance of the parser. Is there any data which is publicly available
> and can
> be used for performance evaluation?.
>
> + DTD's used
>   - should use different types of entities
>
> + XMLSchema's used
>   - should use most of the elements and datatypes
>
> Will it really help in any way?
>
> Any comments or suggestions appreciated.
>
> Thanks,
> Rahul.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>
> ------------- End Forwarded Message -------------
>
> ---------------------------------------------------------------------
> In case of troubles, e-mail:     webmaster@xml.apache.org
> To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org
>
> ---------------------------------------------------------------------
> In case of troubles, e-mail:     webmaster@xml.apache.org
> To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org


---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Mime
View raw message