lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: indexing_slowdown_with_latest_lucene_udpate
Date Mon, 10 Aug 2009 15:10:56 GMT
I already started to prepare a patch... Let's open an issue! You could try
it out with your corpus and post numbers.

There are some additional slowdowns with the new API if you do not reuse
TokenStreams, as the setup of the Attribute maps is an additional small
cost.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: Monday, August 10, 2009 5:08 PM
> To: java-dev@lucene.apache.org
> Subject: Re: indexing_slowdown_with_latest_lucene_udpate
> 
> My bet is that that would still be much faster - uncontentious sync is
> generally very fast and the check method call is extremely slow.
> 
> - Mark
> 
> Uwe Schindler wrote:
> > The question is, if that would get better if the reflection calls are
> only
> > done one time per class using a IdentityHashMap<Class,Boolean>. The
> other
> > reflection code in AttributeSource uses a static cache for such type of
> > things (e.g. the Attribute -> AttributeImpl mappings in AttributeSource.
> > DefaultAttributeFactory.getClassForInterface()).
> >
> > I could do some tests about that and supply a patch. I was thinking
> about
> > that but throwed it away (as it needs some synchronization on the cache
> Map
> > which may also overweigh).
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> >> -----Original Message-----
> >> From: Mark Miller [mailto:markrmiller@gmail.com]
> >> Sent: Monday, August 10, 2009 4:48 PM
> >> To: java-dev@lucene.apache.org
> >> Subject: Re: indexing_slowdown_with_latest_lucene_udpate
> >>
> >> Robert Muir wrote:
> >>
> >>> This is real and not just for very short docs.
> >>>
> >> Yes, you still pay the cost for longer docs, but it just becomes less
> >> important the longer the docs, as it plays a smaller role. Load a ton
> of
> >> one term docs, and it might be 50-60% slower - add a bunch of articles,
> >> and it might be closer to 20%-15% (I don't know the numbers, but the
> >> longer I made the docs, the less % slowdown, obviously). Still a good
> hit,
> >> but a short doc test magnafies the problem.
> >>
> >> It affects things no matter what, but when you don't do much
> tokenizing,
> >> normalizing, the cost of the reflection/tokenstream init dominates.
> >>
> >> - Mark
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> 
> 
> --
> - Mark
> 
> http://www.lucidimagination.com
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message