lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anton Zenkov <azen...@crimsonhexagon.com>
Subject sizes of non-fdt flies affected by compression settings
Date Thu, 01 Oct 2015 20:10:44 GMT
Hello,

I'm experimenting with Lucene 5.2.1 and I see something I cannot find an
easy explanation for in the api docs.
Depending on whether I pick BEST_COMPRESSION or BEST_SPEED mode for
StoredFieldsFormat almost all files become smaller for BEST_COMPRESSION
mode. I expected only .fdt files to be smaller but for some reason the
following file types also shrink very significantly:
.fdx, .doc, .pos. Term dictionary (.tim) also gets smaller though not as
significantly. Weirdly enough .tip becomes a little bigger for the best
compressions setting.
Index contained about 10M small (~300 bytes each) text docs.

I guess I could go through the code myself to understand this but may be
someone can shed some light on this.

Thanks!

Anton

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message