lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Engels" <reng...@ix.netcom.com>
Subject RE: Binary fields and data compression
Date Mon, 30 Aug 2004 21:50:05 GMT
The data size savings is almost certainly not worth the probable 20-40%
increase in CPU usage in most cases no?

I think it should be optional for those who have extremely large indices and
want to save some space (seems not necessary these days), and those who want
to maximize performance.


-----Original Message-----
From: Bernhard Messer [mailto:Bernhard.Messer@intrafind.de]
Sent: Monday, August 30, 2004 4:41 PM
To: lucene-dev@jakarta.apache.org
Subject: Binary fields and data compression


hi developers,

a few month ago, there was a very interesting discussion about field
compression and the possibility to store binary field values within a
lucene document. Regarding to this topic, Drew Farris came up with a
patch to add the necessary functionality. I ran all the necessary tests
on his implementation and didn't find one problem. So the original
implementation from Drew could now be enhanced to compress the binary
field data (maybe even the text fields if they are stored only) before
writing to disc. I made some simple statistical measurements using the
java.util.zip package for data compression. Enabling it, we could save
about 40% data when compressing plain text files with a size from 1KB to
4KB. If there is still some interest, we could first try to update the
patch, because it's outdated due to several changes within the Fields
class. After finishing that, compression could be added to the updated
version of the patch.

sounds good to me, what do you think ?

best regards
Bernhard




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message