lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <dave-lucene-...@tropo.com>
Subject Re: Binary fields and data compression
Date Mon, 30 Aug 2004 22:32:58 GMT
Robert Engels wrote:

> The data size savings is almost certainly not worth the probable 20-40%
> increase in CPU usage in most cases no?
> 
> I think it should be optional for those who have extremely large indices and
> want to save some space (seems not necessary these days), and those who want
> to maximize performance.

You don't know until you benchmark it, but I thought that the heuristic 
nowadays was that CPUs are fast and disk i/o is slow ( and yes, disk 
space is 'infinite' :) ) - so therefore I would guess that in spite of 
the CPU cost of compression, you'd save time due to less disk i/o.


> 
> 
> -----Original Message-----
> From: Bernhard Messer [mailto:Bernhard.Messer@intrafind.de]
> Sent: Monday, August 30, 2004 4:41 PM
> To: lucene-dev@jakarta.apache.org
> Subject: Binary fields and data compression
> 
> 
> hi developers,
> 
> a few month ago, there was a very interesting discussion about field
> compression and the possibility to store binary field values within a
> lucene document. Regarding to this topic, Drew Farris came up with a
> patch to add the necessary functionality. I ran all the necessary tests
> on his implementation and didn't find one problem. So the original
> implementation from Drew could now be enhanced to compress the binary
> field data (maybe even the text fields if they are stored only) before
> writing to disc. I made some simple statistical measurements using the
> java.util.zip package for data compression. Enabling it, we could save
> about 40% data when compressing plain text files with a size from 1KB to
> 4KB. If there is still some interest, we could first try to update the
> patch, because it's outdated due to several changes within the Fields
> class. After finishing that, compression could be added to the updated
> version of the patch.
> 
> sounds good to me, what do you think ?
> 
> best regards
> Bernhard
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message