lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben van Klinken <bvanklin...@gmail.com>
Subject Re: Re: Optimizing indexes with mulitiple processors?
Date Fri, 10 Jun 2005 08:42:31 GMT
This raises an interesting point and it's an issue that i think i
dealt with in CLucene. I modified the way the clucene tokenstream
works with some large performance increases. I change the tokenstream
interface to the following:

from Token next(); to
boolean next(Token t);

then the document writer can use 1 token object over and over again,
thus removing the need to create and destory tokens, which increased
performance dramatically. maybe java lucene could consider doing the
same thing? or maybe the performance increase is just applicable to
c++, you'd have to try :)

just an idea ;)

ben

On 6/10/05, Chris Collins <chris_j_collins@yahoo.com> wrote:
> Forwarding to the dev list as I dont know if this is usefull data....tell me to
> shut up if it isnt.
> 
> Chris
> Note: forwarded message attached.
> 
> 
> 
> 
> 
> ---------- Forwarded message ----------
> From: Chris Collins <chris_j_collins@yahoo.com>
> To: java-user@lucene.apache.org, Bill Au <bill.w.au@gmail.com>
> Date: Thu, 9 Jun 2005 21:58:14 -0700 (PDT)
> Subject: Re: Optimizing indexes with mulitiple processors?
> To follow up.  I was surprised to find that from the experiment of indexing 4k
> documents to local disk (Dell PE with onboard RAID with 256MB cache). I got the
> following data from my profile:
> 
> 70 % time was spent in inverting the document
> 30 % in merge
> 
> Ok that part isnt surprising.  However only about 1% of 30% of the merge was
> spent in the OS.flush call (not very IO bound at all with this controller).
> And almost all of the invert was in the StandardAnalyzer pegged in the javacc
> generated code.  The profile was based upon duration and not cpu. The profiler
> was JProbe.  I was using a lower case analyzer and this was a slightly hacked
> lucene-1.4.3 source code line that I swapped out some of the synchronized data
> structures (hashtable ->hashmap,  Vector->ArrayList).
> 
> <<ChRiS>>
> 
> --- Chris Collins <chris_j_collins@yahoo.com> wrote:
> 
> > I found with a fast RAID controller that I can easily be CPU bound, some of
> > the
> > io is related to latency.  You can hide the latency by having overlapping IO
> > (you get that with multiple indexers going on at the same time).
> >
> > I think there possibly could be more horsepower you can get out of the
> > inverter
> > and merge aspects of the indexing.  I am currently jprobeing this at the
> > moment.
> >
> > If your using high latency disks (such as a filer) during merge you may want
> > to
> > consider increasing the size of the buffers to reduce the amount of rpc's to
> > the filer....however my previous attempts to change this failed.
> >
> > C
> >
> > --- Bill Au <bill.w.au@gmail.com> wrote:
> >
> > > Optimize is disk I/O bound.  So I am not sure what multiple CPUs will buy
> > > you.
> > >
> > > Bill
> > >
> > > On 6/9/05, Kevin Burton <burton@rojo.com> wrote:
> > > > Is it possible to get Lucene to do an index optimize on multiple
> > > > processors?
> > > >
> > > > Its a single threaded algorithm currently right?
> > > >
> > > > Its a shame since I have a quad  machine but I'm only using 1/4th of the
> > > > capacity.  Thats a heck of a performance hit.
> > > >
> > > > Kevin
> > > >
> > > > --
> > > >
> > > >
> > > > Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
> > > > See irc.freenode.net #rojo if you want to chat.
> > > >
> > > > Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
> > > >
> > > >    Kevin A. Burton, Location - San Francisco, CA
> > > >       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
> > > > GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message