lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "McCallie,David" <DMCCAL...@CERNER.COM>
Subject RE: stored field compression
Date Fri, 14 May 2004 17:00:48 GMT
A few months ago, I did a quick and dirty experiment using the compression utilities to compress stored text fields using
Lucene 1.3. Unfortunately, I don't still have the data available, but as
I recall, it was not clear that compression was always a benefit.  In
particular, if the text fields are short (like titles of a paper,) the
overhead of the compression's embedded dictionary can make the
compressed string longer than the uncompressed string.  Additionally,
the CPU overhead was non-trivial compared to the already fast Lucene
searches.  There probably are better compression algorithms than the ZIP
approach, but that's the only one I tried.  If one were to use an
"expensive" method like ZIP, then it might make sense to have some sort
of threshold length before the compression kicks in?  The "isCompressed"
flag might only take effect if that threshold were exceeded?
Alternately, the user could be responsible to set the isCompressed flag
based on the field's length. 

David McCallie

-----Original Message-----
From: Doug Cutting [] 
Sent: Friday, May 14, 2004 11:23 AM
To: Lucene Developers List
Subject: Re: stored field compression

Doug Cutting wrote:
> A more elaborate approach would be to lazily decompress fields when 
> values are accessed.

Another big advantage of this approach (as reminded by Peter Cipollone)
is that it will make indexing faster, as decompression will be avoided
when merging.


To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message