lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chuck Williams <>
Subject Re: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?
Date Thu, 31 Mar 2005 19:26:37 GMT
Wolf Siberski writes (3/31/2005 1:54 AM):

> As some time has passed now since I submitted the Multisearcher
> patch, and no objections have been raised, I would like to ask
> to commit it now. I have put substantial effort into it, and my
> concern is that conflicts with newer patches will emerge if
> the commit is delayed further.

I don't get a vote, but if I did it would be:

I filed the original bug, prepared the first weak attempt at a patch, 
and participated in the design discussion that led to Wolf fixing the 
problem.  I haven't tried running this patch, but did just read it, and 
it looks solid.

As we discussed in the design, this seems the best first solution as it 
is fairly easy to assert its correctness.  Going forward, I would like 
to see some performance measurements and expect we'll want to introduce 
some optimizations, the most important of which is to cache the 
cumulative docFreqs for a large number of terms in a scope much larger 
than a single query.  This is more difficult as it would require some 
type of coordination with the indexing processes on the remote nodes 
(although for a large index in most real cases there would not be any 
need to keep these completely synchronized, as the the instantaenous 
changes in docFreq's are not very important to the relevance ranking; 
some kind of periodic synchronization approach, analagous to optimizing 
indexes, would be quite sufficient).

A more minor point is that allocations could be reduced in 
MultiSearcher.prepareWeight (e.g., a simple one is to use a single 
HashMap rather than a HashSet and a HashMap for computing the cumulative 
docFreq's for all query terms).  But again, I think Wolf did the right 
thing in creating the easily-validated correct implementation as the 
first step.

I'm sorry to have taken so long to review this.  I hope to use it within 
the next 3 weaks on a scalability benchmark and will report back the 
results.  Please do commit it as Wolf requests so that it gets synced up 
with other activities.  E.g., the changes to BooleanQuery will need to 
be integrated with Paul's work assuming that gets committed as well.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message