lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon McDuff <smcd...@hotmail.com>
Subject RE: Flushing Thread
Date Fri, 20 Jul 2012 00:29:28 GMT

Thank you Simon Willnauer!

With your explanation, we`ve decided to control the flushing by spawning another thread. So
the thread is available to still ingest ! :-) (correct me if I'm wrong)We do so by checking
the RAM size provided by Lucene! (Thank you!)By putting the automatic flushing at 1000 megs
and our controlling at 900 megs, we know that the automatic flushing "should" not happen.
I know you contribute a lot to the concurrency feature! This is great! I was very excited
to try it!
We tried the following approaches:Option 1- 6 threads referring to the same IndexWriterOption
2- 6 threads having their own IndexWriter, merge it at the end
Unfortunately, we found that option 2 scale better. I'm not sure why option 1 didn`t scale.
Is it possible that synchronization between threads is too costly ? ... I don`t have an answered
but it was definitely slower.
With option 2, we are able to insert between 800 000 - 900 000 documents / sec. (we've modified
lucene to remove some bottleneck)Threads DO NOT ONLY index, it does other stuff before adding
documents. 
Did you look at the disruptor pattern (by LMAX) ? It helped us a lot to achieve great performance
in multithreaded environment!
Thank you
Simon M.




> Date: Thu, 19 Jul 2012 21:52:19 +0200
> Subject: Re: Flushing Thread
> From: simon.willnauer@gmail.com
> To: java-user@lucene.apache.org
> 
> hey,
> 
> On Thu, Jul 19, 2012 at 7:41 PM, Simon McDuff <smcduff@hotmail.com> wrote:
> >
> > Thank you for your answer!
> >
> > I read all your blogs! It is always interesting!
> 
> for details see:
> 
> http://www.searchworkings.org/blog/-/blogs/gimme-all-resources-you-have-i-can-use-them!/
> 
> and
> 
> http://www.searchworkings.org/blog/-/blogs/lucene-indexing-gains-concurrency/
> >
> > My understanding is probably incorrect ...
> > I observed that if you have only one thread that addDocument, it will not spawn
another thread for flushing, it uses the main thread.
> 
> every indexing thread can hit a flush. if you only have one thread you
> will not make progress adding docs while flushing.
> IW will not create new threads for flushing.
> > In this case, my main thread is locked. Correct ?
> >
> > The concurrent flushing will ONLY work when I have many threads adding documents
? (In that case I will need to put a ringbuffer in front)
> 
> that is basically correct. You can frequently call commit / or pull a
> reader from the IW in a different thread before you ram buffer fills
> up so that flushing happens in a different thread. That could work
> pretty well if you don't have many deletes to be applied. (if you have
> many deletes then pull a reader without applying deletes.
> 
> simon
> >
> > Do I understand correctly ? Did I miss something ?
> >
> > Simon
> >
> >> From: lucene@mikemccandless.com
> >> Date: Thu, 19 Jul 2012 13:02:42 -0400
> >> Subject: Re: Flushing Thread
> >> To: java-user@lucene.apache.org
> >>
> >> This has already been fixed on Lucene 4.0 (we now have fully
> >> concurrent flushing), eg see:
> >>
> >>   http://blog.mikemccandless.com/2011/05/265-indexing-speedup-with-lucenes.html
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >> On Thu, Jul 19, 2012 at 12:54 PM, Simon McDuff <smcduff@hotmail.com> wrote:
> >> >
> >> > I see some behavior at the moment when I'm flushing and would like to know
if I can change that.
> >> >
> >> >  One main thread is inserting, when it flushes, it blocks.
> >> >  During that time my main thread is blocking. Instead of blocking, Could
it spawn another thread to do that ?
> >> >
> >> > Basically,  would like to have one main thread adding document to my index,
if a flushing needs to occur, spawn another threads but it should never lock the main  threads.
Is it possible ?
> >> >
> >> > Is the only solution is to have many threads indexing the data ?
> >> > In that case Is it true to say ONLY one of them will be busy while the
other is flushing ? (I do understand that if my flushing is taking two much time, they will
both flush... :-))
> >> >
> >> > Thank you!
> >> >
> >> > Simon
> >> >
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message