lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Serebrennikov <>
Subject Re: multithreading in SegmentsReader
Date Thu, 11 Oct 2001 00:38:55 GMT

Doug Cutting wrote:

>Your analysis looks good to me.
>I think it would be simpler, if a bit less optimized, to just make
>SegmentsReader.numDocs() and SegmentsReader.delete() synchronized methods.
>Does that sound like a reasonable fix to you?
Yes, that sounds fine. Delete can definetely just be a synchronized 
method. And so can the numDocs unless it is called a lot. Is it? If it 
is, we may want to leave the upfront check in there before it is 

>Thanks for spotting this.
No problem :). I'm learning a lot by taking the code apart and figuring 
out what goes where. It is very impressive! :) A lot of optimizations in 
terms of pre-computing numerical stuff, keeping the memory foot print 
low. The use of PriorityQueues everywhere. Speaking of which, I was just 
going through the SegmentsTermEnum and noticed that in the next() method 
it does a queue.pop() followed by queue.put(), instead of the 
queue.adjustTop() which the comment says is at least two times faster. I 
guess the adjustTop was added after this code was written, but maybe I'm 
missing something or this queue is small enough that this doesn't 
matter. I wasn't going to change it or bring it up, but since we are 
talking ...  Here's the code:

  public final boolean next() throws IOException {
    SegmentMergeInfo top = (SegmentMergeInfo);
    if (top == null) {
      term = null;
      return false;

    term = top.term;
    docFreq = 0;

    while (top != null && term.compareTo(top.term) == 0) {
      docFreq += top.termEnum.docFreq();      // increment freq
      if (
    queue.put(top);                  // restore queue
    top.close();                  // done with a segment
      top = (SegmentMergeInfo);
    return true;


>As for closing, your analysis also sounds correct: if an index is closed
>from one thread while another thread is reading it, then the other thread
>may experience errors.  I think the a good solution here is simply to not
>close an IndexReader when there may be other threads that may be using it.
>FSDirectory closes files with finalize methods, so all files will be closed
>when the IndexReader is garbage collected.  The primary client of the
>close() method is the index merging code, which requires prompt closing of
>files to keep the number of file descriptors low, and knows that other
>threads cannot be accessing them.
>So I think changing the documentation and examples will suffice here.  The
>documentation for IndexReader.close() method should note that it should only
>be called if no other threads are accessing the index, and the search
>examples should not bother to call close, letting the garbage collector
>close files.  Is that acceptable?
Sure. Too bad, though!
I liked your implementation of the Lock.With and I was hoping to see how 
you would solve this synchronization problem where you have readers and 
writers instead of just a single lock. Is there a good solution to this 
problem you could point me to?

View raw message