lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: sizes of non-fdt flies affected by compression settings
Date Wed, 04 Nov 2015 21:53:45 GMT

: This setting can only affect the size of the fdt (and fdx) files. I suspect
: you saw differences in the size of other files because it caused Lucene to
: run different merges (because segments had different sizes), and the
: compression that we use for postings/terms worked better, but it could have
: been the other way as well.

You can check the number of documents in each segment to verify Adrien's 
comments.

If you want to do a true "apples to apples" comparison on just the impacts 
of stored field compression, choose something like the NoMergePolicy or 
LogDocMergePolicy for your test to ensure that the number of documents per 
segment are not impacted by the size (in bytes) of any of the files in 
those segments.


: > Hello,
: >
: > I'm experimenting with Lucene 5.2.1 and I see something I cannot find an
: > easy explanation for in the api docs.
: > Depending on whether I pick BEST_COMPRESSION or BEST_SPEED mode for
: > StoredFieldsFormat almost all files become smaller for BEST_COMPRESSION
: > mode. I expected only .fdt files to be smaller but for some reason the
: > following file types also shrink very significantly:
: > .fdx, .doc, .pos. Term dictionary (.tim) also gets smaller though not as
: > significantly. Weirdly enough .tip becomes a little bigger for the best
: > compressions setting.
: > Index contained about 10M small (~300 bytes each) text docs.
: >
: > I guess I could go through the code myself to understand this but may be
: > someone can shed some light on this.
: >
: > Thanks!
: >
: > Anton
: >
: 

-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message