lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Collins <chris_j_coll...@yahoo.com>
Subject Re: Optimizing indexes with mulitiple processors?
Date Fri, 10 Jun 2005 04:58:14 GMT
To follow up.  I was surprised to find that from the experiment of indexing 4k
documents to local disk (Dell PE with onboard RAID with 256MB cache). I got the
following data from my profile:

70 % time was spent in inverting the document
30 % in merge

Ok that part isnt surprising.  However only about 1% of 30% of the merge was
spent in the OS.flush call (not very IO bound at all with this controller). 
And almost all of the invert was in the StandardAnalyzer pegged in the javacc
generated code.  The profile was based upon duration and not cpu. The profiler
was JProbe.  I was using a lower case analyzer and this was a slightly hacked
lucene-1.4.3 source code line that I swapped out some of the synchronized data
structures (hashtable ->hashmap,  Vector->ArrayList).

<<ChRiS>>

--- Chris Collins <chris_j_collins@yahoo.com> wrote:

> I found with a fast RAID controller that I can easily be CPU bound, some of
> the
> io is related to latency.  You can hide the latency by having overlapping IO
> (you get that with multiple indexers going on at the same time).
> 
> I think there possibly could be more horsepower you can get out of the
> inverter
> and merge aspects of the indexing.  I am currently jprobeing this at the
> moment.
> 
> If your using high latency disks (such as a filer) during merge you may want
> to
> consider increasing the size of the buffers to reduce the amount of rpc's to
> the filer....however my previous attempts to change this failed.
> 
> C 
> 
> --- Bill Au <bill.w.au@gmail.com> wrote:
> 
> > Optimize is disk I/O bound.  So I am not sure what multiple CPUs will buy
> > you.
> > 
> > Bill
> > 
> > On 6/9/05, Kevin Burton <burton@rojo.com> wrote:
> > > Is it possible to get Lucene to do an index optimize on multiple
> > > processors?
> > > 
> > > Its a single threaded algorithm currently right?
> > > 
> > > Its a shame since I have a quad  machine but I'm only using 1/4th of the
> > > capacity.  Thats a heck of a performance hit.
> > > 
> > > Kevin
> > > 
> > > --
> > > 
> > > 
> > > Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
> > > See irc.freenode.net #rojo if you want to chat.
> > > 
> > > Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
> > > 
> > >    Kevin A. Burton, Location - San Francisco, CA
> > >       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
> > > GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
> > > 
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > 
> > >
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
> > 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message