lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Slowing down (rate-limiting/throttling) IndexWriter.optimize
Date Sat, 23 Aug 2008 13:42:46 GMT

I think IO throttling would be a useful built-in feature.

I imagine many people are actually uknowingly affected by this, when  
they make changes to their index on the same machine that also does  
simultaneous searching.  It's not only optimize() that will cause  
this, but also normal flushing of segments, addIndex*, expungeDeletes,  
normal segment merging, etc.

It'd be nice if the throttling could somehow be conditional on whether  
there is "contention", ie, searches are currently doing reading.

Really the OS should provide this facility to us, but it doesn't (at  
least not up through Java's APIs).  Linux does let you pick the IO  
Scheduler to use, and at least one of these IO Schedulers lets you  
prioritize whole processes wrt IO.  It's not an easy problem to solve!

Mike

Halsey, Stephen wrote:

> Hi,
>
> We are using lucene to index a large number of documents (millions)  
> and
> we currently optimize half the index in the background every 2 days,  
> to
> stop it becoming too fragmented.  This takes about an hour and we are
> finding during this time searches are slowed down dramatically on that
> machine.  This is not due to CPU as it is a dual CPU box, so I'm
> thinking it must be the large amounts of IO being used to optimize the
> index.
>
> I was wondering if anyone has any ideas for alleviating this problem?
>
> One option I've come up with is to slowly copy the index to a second
> second offline box, optimize there and then slowly copy the newly
> optimized index back onto the search box.  To slow down the IO so that
> bandwidth and IO are not maxed out I thought I could use something  
> like
> the linux Traffic Control (tc) program
> http://tldp.org/HOWTO/Traffic-Control-HOWTO/elements.html#e-shaping  
> (see
> also http://gentoo-wiki.com/HOWTO_Apache_2_bandwidth_limiting ) or  
> tar,
> nfs and http://www.ivarch.com/programs/quickref/pv.shtml and its
> rate-limit option to limit how quickly the index directory is copied  
> to
> and from the remote machine.  This option doesn't seem ideal as it  
> would
> involve other programs, servers and scripts.
>
> The other option is to do it all within the existing Java program, by
> rate-limiting/throttling the IO of the lucene Directory being used  
> to do
> the optimize.  I've done this in Lucene by extending the FSDirectory  
> and
> the FSIndexOutput classes and putting a small sleep in the
> FSIndexOutput.flushBuffer, and it seems to work OK.  I'm not that keen
> on copying and modifying lucene code though, because I'll have to  
> check
> and possibly modify it every time I upgrade lucene, so if there is a
> reasonable alternative I'd be interested in hearing anyone's ideas?   
> If
> people think IO throttling FSDirectory may be a good idea and useful  
> for
> them, I could develop it more and possibly contact lucene-dev to look
> into getting it added to the lucene trunk?
>
> Cheers
>
>
>
> steve
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message