lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Collins <chris_j_coll...@yahoo.com>
Subject Re: Re: Optimizing indexes with mulitiple processors?
Date Thu, 23 Jun 2005 19:27:38 GMT
Possible, but from the profile I did it was time basically spent in the state
machine logic and not newing tokens. 


C

--- Ben van Klinken <bvanklinken@gmail.com> wrote:

> This raises an interesting point and it's an issue that i think i
> dealt with in CLucene. I modified the way the clucene tokenstream
> works with some large performance increases. I change the tokenstream
> interface to the following:
> 
> from Token next(); to
> boolean next(Token t);
> 
> then the document writer can use 1 token object over and over again,
> thus removing the need to create and destory tokens, which increased
> performance dramatically. maybe java lucene could consider doing the
> same thing? or maybe the performance increase is just applicable to
> c++, you'd have to try :)
> 
> just an idea ;)
> 
> ben
> 
> On 6/10/05, Chris Collins <chris_j_collins@yahoo.com> wrote:
> > Forwarding to the dev list as I dont know if this is usefull data....tell
> me to
> > shut up if it isnt.
> > 
> > Chris
> > Note: forwarded message attached.
> > 
> > 
> > 
> > 
> > 
> > ---------- Forwarded message ----------
> > From: Chris Collins <chris_j_collins@yahoo.com>
> > To: java-user@lucene.apache.org, Bill Au <bill.w.au@gmail.com>
> > Date: Thu, 9 Jun 2005 21:58:14 -0700 (PDT)
> > Subject: Re: Optimizing indexes with mulitiple processors?
> > To follow up.  I was surprised to find that from the experiment of indexing
> 4k
> > documents to local disk (Dell PE with onboard RAID with 256MB cache). I got
> the
> > following data from my profile:
> > 
> > 70 % time was spent in inverting the document
> > 30 % in merge
> > 
> > Ok that part isnt surprising.  However only about 1% of 30% of the merge
> was
> > spent in the OS.flush call (not very IO bound at all with this controller).
> > And almost all of the invert was in the StandardAnalyzer pegged in the
> javacc
> > generated code.  The profile was based upon duration and not cpu. The
> profiler
> > was JProbe.  I was using a lower case analyzer and this was a slightly
> hacked
> > lucene-1.4.3 source code line that I swapped out some of the synchronized
> data
> > structures (hashtable ->hashmap,  Vector->ArrayList).
> > 
> > <<ChRiS>>
> > 
> > --- Chris Collins <chris_j_collins@yahoo.com> wrote:
> > 
> > > I found with a fast RAID controller that I can easily be CPU bound, some
> of
> > > the
> > > io is related to latency.  You can hide the latency by having overlapping
> IO
> > > (you get that with multiple indexers going on at the same time).
> > >
> > > I think there possibly could be more horsepower you can get out of the
> > > inverter
> > > and merge aspects of the indexing.  I am currently jprobeing this at the
> > > moment.
> > >
> > > If your using high latency disks (such as a filer) during merge you may
> want
> > > to
> > > consider increasing the size of the buffers to reduce the amount of rpc's
> to
> > > the filer....however my previous attempts to change this failed.
> > >
> > > C
> > >
> > > --- Bill Au <bill.w.au@gmail.com> wrote:
> > >
> > > > Optimize is disk I/O bound.  So I am not sure what multiple CPUs will
> buy
> > > > you.
> > > >
> > > > Bill
> > > >
> > > > On 6/9/05, Kevin Burton <burton@rojo.com> wrote:
> > > > > Is it possible to get Lucene to do an index optimize on multiple
> > > > > processors?
> > > > >
> > > > > Its a single threaded algorithm currently right?
> > > > >
> > > > > Its a shame since I have a quad  machine but I'm only using 1/4th
of
> the
> > > > > capacity.  Thats a heck of a performance hit.
> > > > >
> > > > > Kevin
> > > > >
> > > > > --
> > > > >
> > > > >
> > > > > Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
> > > > > See irc.freenode.net #rojo if you want to chat.
> > > > >
> > > > > Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
> > > > >
> > > > >    Kevin A. Burton, Location - San Francisco, CA
> > > > >       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
> > > > > GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
> > > > >
> > > > >
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > > >
> > > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> > 
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message