lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: sizes of non-fdt flies affected by compression settings
Date Sun, 01 Nov 2015 22:35:12 GMT
Hi Anton,

This setting can only affect the size of the fdt (and fdx) files. I suspect
you saw differences in the size of other files because it caused Lucene to
run different merges (because segments had different sizes), and the
compression that we use for postings/terms worked better, but it could have
been the other way as well.

Le jeu. 1 oct. 2015 à 22:10, Anton Zenkov <azenkov@crimsonhexagon.com> a
écrit :

> Hello,
>
> I'm experimenting with Lucene 5.2.1 and I see something I cannot find an
> easy explanation for in the api docs.
> Depending on whether I pick BEST_COMPRESSION or BEST_SPEED mode for
> StoredFieldsFormat almost all files become smaller for BEST_COMPRESSION
> mode. I expected only .fdt files to be smaller but for some reason the
> following file types also shrink very significantly:
> .fdx, .doc, .pos. Term dictionary (.tim) also gets smaller though not as
> significantly. Weirdly enough .tip becomes a little bigger for the best
> compressions setting.
> Index contained about 10M small (~300 bytes each) text docs.
>
> I guess I could go through the code myself to understand this but may be
> someone can shed some light on this.
>
> Thanks!
>
> Anton
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message