lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Busch <busch...@gmail.com>
Subject Re: Payloads
Date Fri, 22 Dec 2006 23:32:37 GMT
Nicolas Lalevée wrote:
>
> I have just looked at it. It looks great :)
>   
Thanks! :-)

> But I still doesn't understand why a new entry in the fieldinfo is needed. 
>   

The entry is not really *needed*, but I use it for 
backwards-compatibility and as an optimization for fields that don't 
have any tokens with payloads. For fields with payloads the 
PositionDelta is shifted one bit, so for certain values this means that 
the VInt needs an extra byte. I have an index with about 500k web 
documents and measured, that about 8% of all PositionDelta values would 
need one extra byte in case PositionDelta is shifted. For my index that 
means roughly 4% growth of the total index size. With using a fieldbit, 
payloads can be disabled for a field and therefore the shifting of 
PositionDelta can be avoided. Furthermore, if the payload-fieldbit is 
not enabled, then the index format does not change at all.

> There is the same for TermVector. And code like that fail for no obvious 
> reason :
>
> Document doc = new Document();
> doc.add(new Field("f1", "v1", Store.YES, Index.TOKENIZED, 
> TermVector.WITH_POSITIONS_OFFSETS));
> doc.add(new Field("f1", "v2", Store.YES, Index.TOKENIZED, TermVector.NO));
>
> RAMDirectory ram = new RAMDirectory();
> IndexWriter writer = new IndexWriter(ram, new StandardAnalyzer(), true);
> writer.addDocument(doc);
> writer.close();
>
> Knowing a little bit about how lucene works, I have an idea why this fail, but 
> can we avoid this ?
>
> Nicolas
>   
In the payload case there is no problem like this one. There is no new 
Field option that can be used to set the fieldbit explicitly. The bit is 
set automatically for a field as soon as the first Token of that field 
that carries a payload is encountered.

Michael


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message