lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Lucene 4.0 .FDT
Date Thu, 19 Jul 2012 14:44:28 GMT
On 19/07/2012 14:26, Simon McDuff wrote:
>
> I'm using Lucene 4.0.
>
> I'm inserting around 300 000 documents / seconds.
>
> We do not have any store fields. But we noticed that .fdt get populated even so.
>
> .fdx contains useless informations.
> .fdt contains only zero....useless...
>
> Is there a way to minimize the impact ?

This happens because the Lucene40StoredFieldsWriter (part of the 
Lucene40 Codec) uses a simplistic layout for the data - for every 
document it writes a long to the .fdx file (8 bytes) to mark the 
position of the fields' data, and a vint to the .fdt file (at least one 
byte) to record the number of fields, and then the actual stored fields' 
data.

We could modify this format to be less verbose for documents without 
stored fields, e.g. use block-delta encoding of the .fdx file and avoid 
writing anything to the .fdt file if there are no stored fields. The 
question is whether the space savings would be worth the complication?

-- 
Best regards,
Andrzej Bialecki
http://www.sigram.com, blog http://www.sigram.com/blog
  ___.,___,___,___,_._. __________________<><____________________
[___||.__|__/|__||\/|: Information Retrieval, System Integration
___|||__||..\|..||..|: Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message