lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: Lucene features
Date Thu, 11 Sep 2003 18:05:53 GMT
Leo Galambos wrote:
> Example: I use this notation: inverted_list_term:{list of W values, "-" 
> denotes W=0, for 12 documents in a collection}
> A:{23[16]------27}
> B:{--[38]--------}
> C:{18[2-]45239812}
> If your first query is B, the subset of documents (denoted by brackets - 
> namely, the 3rd and 4th doc) is selected, and if your second query is "A 
> C", then you cannot use global IDFs, because in the subset, the IDF 
> factors are different. Globally, A is better distriminator, but in the 
> subset, C is better. This fact is then reflected by the hit list you 
> generate, and I guess, the quality will be also affected by this.
> The example shows, that you would rather export the subset to an 
> auxiliary index (RAMDirectory?) and then use this structure instead of 
> the original index. Obviously, it will solve the issue of speed you 
> mentioned.
> Unfortunately, I am not sure, if you can export the inverted lists when 
> you read them. In egothor, I would use a listener in Rider class, in 
> Lucene, I would have to rewrite some classes and it could be a real 
> problem. Maybe, there is a solution I do not see...

I have some extensions to Lucene that I've not yet commited which make 
it possible to easily define synthetic IndexReaders (not currently 
supported).  So you could do things that way, once I check these in. 
But is this really better than just ANDing the clauses together?  It 
would take some big experiments to know, but my guess is that it doesn't 
make much difference to compute a "local" IDF for such things.


View raw message