lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <DCutt...@grandcentral.com>
Subject RE: multithreading in SegmentsReader
Date Thu, 11 Oct 2001 16:18:29 GMT
> From: Dmitry Serebrennikov [mailto:dmitrys@earthlink.net]
>
> Yes, that sounds fine. Delete can definetely just be a synchronized 
> method. And so can the numDocs unless it is called a lot. Is 
> it? If it 
> is, we may want to leave the upfront check in there before it is 
> synchronized.

It should not be called in inner loops.  Synchronized methods don't add much
overhead.  The main problem is if they are slow then they can become a
bottleneck for multi-threading.  This method is only slow the first time it
is called, when it actually needs to lock out other threads, so I don't
think making it synchronized is a problem.

> >Thanks for spotting this.
> >
> No problem :). I'm learning a lot by taking the code apart 
> and figuring out what goes where.

I think that you are giving Lucene one of the closest readings that it has
had!

> I was just 
> going through the SegmentsTermEnum and noticed that in the 
> next() method 
> it does a queue.pop() followed by queue.put(), instead of the 
> queue.adjustTop() which the comment says is at least two 
> times faster. I 
> guess the adjustTop was added after this code was written,

I think you're right.  I added adjustTop() when writing the phrase scorers,
then realized that I couldn't use it there, and never looked for other
places to use it.

With this optimization that loop becomes:
    
    while (top != null && term.compareTo(top.term) == 0) {
      docFreq += top.termEnum.docFreq();          // increment freq
      if (top.next())
        queue.adjustTop();                        // restore queue
      else {
        queue.pop();
        top.close();                              // done with a segment
      }
      top = (SegmentMergeInfo)queue.top();
    }

I tested it and it seems to work, but I couldn't notice any speed
difference.  I will wait until after the 1.2 final release before submitting
this change.

This could also potentially be used in SegmentMerger, but the changes would
be a lot more complex.  SegmentMerger could in fact be written in terms of
SegmentsReader, but it would probably slow index merging down, since the
TermInfo would be read twice for each term, once by SegmentsTermEnum and
once by SegmentsTermDocs.  My inclination is to leave SegmentMerger alone.
The priority queue access is probably not significant there anyway.

Doug

Mime
View raw message