lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: stored field compression
Date Fri, 14 May 2004 17:46:43 GMT
I'd figured that, for the reasons you mention, folks would only use 
compression for long fields, like document contents, not for short 
fields, like urls and titles.  For fields which tend to be short the 
time/space tradeoffs are not worthwhile.

Doug

McCallie,David wrote:
> A few months ago, I did a quick and dirty experiment using the
> java.util.zip compression utilities to compress stored text fields using
> Lucene 1.3. Unfortunately, I don't still have the data available, but as
> I recall, it was not clear that compression was always a benefit.  In
> particular, if the text fields are short (like titles of a paper,) the
> overhead of the compression's embedded dictionary can make the
> compressed string longer than the uncompressed string.  Additionally,
> the CPU overhead was non-trivial compared to the already fast Lucene
> searches.  There probably are better compression algorithms than the ZIP
> approach, but that's the only one I tried.  If one were to use an
> "expensive" method like ZIP, then it might make sense to have some sort
> of threshold length before the compression kicks in?  The "isCompressed"
> flag might only take effect if that threshold were exceeded?
> Alternately, the user could be responsible to set the isCompressed flag
> based on the field's length. 
> 
> David McCallie
> 
> 
> 
> -----Original Message-----
> From: Doug Cutting [mailto:cutting@apache.org] 
> Sent: Friday, May 14, 2004 11:23 AM
> To: Lucene Developers List
> Subject: Re: stored field compression
> 
> Doug Cutting wrote:
> 
>>A more elaborate approach would be to lazily decompress fields when 
>>values are accessed.
> 
> 
> Another big advantage of this approach (as reminded by Peter Cipollone)
> is that it will make indexing faster, as decompression will be avoided
> when merging.
> 
> Doug
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message