lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renaud Delbru <renaud.del...@deri.org>
Subject Re: Flex & Docs/AndPositionsEnum
Date Wed, 10 Feb 2010 12:58:20 GMT
On 10/02/10 09:47, Uwe Schindler wrote:
> Positions as attributes would be good. For positions we need a new Attribute (not PositionIncrement),
but e.g. for offsets and payloads we can use the standard attributes from the analysis, which
is really cool. This would also make it possible to add all custom attributes from the analysis
phase to the posting list and make them visible in the TermDocs enum. In my opinion, there
should be no DocsEnum, DocsAndPositionsEnum and so on enums, just one class, which only differes
in provided attributes. So if you want the payloads, ask for a standard DocsEnum and pass
the requested attribute classes as parameter):
> 	IndexReader.termDocsEnum(Bits skipDocs, String field, BytesRef term, Class<? extends
Attribute>... atts)
>
> If somebody wants offsets and payloads:
> 	reader.termDocsEnum(skipDocs, "field", term, OffsetAttribute.class, PayloadAttribute.class);
>    
I kind of like this idea. This interface to iterate over the postings 
looks more flexible, and imho it will be easy to use this interface with 
any "home-brewed" codec.
Read optimisations based on the user need such as the current 
termDocsEnum and termPositionsEnum (where one is reading only the freq 
file, the second one is also reading the prox file) will be done under 
the hood by the respective PostingReader. Given the set of Attribute 
class received, the PostingReader knows what he needs to read, and what 
he does not need to read. So, there is also a simplification of the 
interface for the user. It does not have to take care of choosing the 
right enum.
> I am not sure if this is very good in Lucene as it would break lots of apps. E.g. simple
autocompletes use a PrefixTerm(s)Enums, but must use the top-level reader or they have to
emulate merging of all TermsEnums themselves. A second problem (currently) is rewrites (e.g.
Fuzzy) to BooleanQuery for MTQs. They operate on the top level reader.
>
> So I propose "simple" and not so performant Enums for MultiReaders. In my opinion, it
would also be possible without ProxyAttributes, if we simply copy them around. It’s a performance
problem, but if somebody needs speed, segment-level enums should be used (and search does
this by the way).
>    
Could you provide pointers to search code that uses the segment-level 
enum ?
As I explained in my last answer to Michael, the TermScorer is using the 
DocsEnum interface, and therefore do not know if it manipulates 
segment-level enum or a Multi*Enums. What search (or query operators) in 
Lucene is using segment-level enums ?

Cheers
-- 
Renaud Delbru

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message