lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renaud Delbru <>
Subject Re: Flex & Docs/AndPositionsEnum
Date Wed, 10 Feb 2010 12:45:03 GMT
Hi Michael,

On 09/02/10 20:47, Michael McCandless wrote:
> But, then, it's very convenient when you need it and don't care about
> performance.  EG in Renaud's usage, a test case that is trying to
> assert that all indexed docs look right, why should you be forced to
> operate per segment?  He shouldn't have to bother with the details of
> which field/term/doc was indexed into which segment.
> Or, I guess we could argue that this test really should create a
> TermQuery and walk the matching docs... instead of using the low level
> flex enum APIs.  Because searching impl already knows how to step
> through the segments.
In fact, I care about performance, but I was using the 
IndexReader.termPositionsEnum to mimic the implementation of the 
different query scorers (e.g., TermScorer).
I have already reimplemented many of the original Lucene Scorers to use 
my particular index structure. From what I have seen, the main low level 
scorers (e.g., TermScorer, PhraseScorer) are using the DocsEnum 
interface, and not a segment-level enum. From what I understand, these 
scorers are not aware if they are using a segment-level enum or a 
Multi*Enum. So, there is a loss of performance in this case ? Or do I 
miss something ?

I'll try to clarify my usage of the Flex API, maybe it can highlight you 
certain aspects.
In the ideal world, what I would like to do is the following:
1) write my own codec,
2) register my codec in the IndexWriter, and tell him to use this codec 
for one or more fields (similar to the PerFieldCodecWrapper),
3) write query operators that are compatible with my codec,
4) at search time, use these query operators with the fields that use my 

If by error, I am using the query operators which are not compatible 
with a field (and its related codec), an exception is thrown telling me 
that I am not able to use these query operators with this field.

So, in my current use case, I don't think it is necessary to be aware of 
that fact that I am manipulating multiple segments or only one segment. 
I think this should be hidden.

But what you were suggesting is to create my own "MultiReader" that is 
optimised for my codec. Is that right ? A MultiReader that just iterates 
over the subreaders, checks if they are using my codec (and therefore 
associated fields), and uses them to iterate over my own postings ?
Renaud Delbru

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message