lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernhard Messer <Bernhard.Mes...@intrafind.de>
Subject Re: Binary fields and data compression
Date Tue, 31 Aug 2004 08:05:34 GMT
Otis,

that's exactly what i have in mind. Compression should be optional on 
binary fields only in the first step. The default setting for 
compression should be "off" and must be enabled by the user. I would 
also check the size of the byte array passed in. Even if compression is 
enabled, it doesn't make sense to compress a dataset which is too small. 
We would end up with a compressed size which is bigger than the original 
size, due to the fact that compression needs some overhead.

Having the implementation ready, we could run several tests to see how 
the overall performance will be affected when using compression.

Bernhard


Otis Gospodnetic wrote:

>Bernhard,
>
>Sounds good to me.
>I would, however, also be interested in the performance impact of
>text-field compression.  While adapting Drew's patch, it may be nice to
>make the compression mechanism pluggable.
>
>Otis
>
>--- Bernhard Messer <Bernhard.Messer@intrafind.de> wrote:
>
>  
>
>>hi developers,
>>
>>a few month ago, there was a very interesting discussion about field 
>>compression and the possibility to store binary field values within a
>>
>>lucene document. Regarding to this topic, Drew Farris came up with a 
>>patch to add the necessary functionality. I ran all the necessary
>>tests 
>>on his implementation and didn't find one problem. So the original 
>>implementation from Drew could now be enhanced to compress the binary
>>
>>field data (maybe even the text fields if they are stored only)
>>before 
>>writing to disc. I made some simple statistical measurements using
>>the 
>>java.util.zip package for data compression. Enabling it, we could
>>save 
>>about 40% data when compressing plain text files with a size from 1KB
>>to 
>>4KB. If there is still some interest, we could first try to update
>>the 
>>patch, because it's outdated due to several changes within the Fields
>>
>>class. After finishing that, compression could be added to the
>>updated 
>>version of the patch.
>>
>>sounds good to me, what do you think ?
>>
>>best regards
>>Bernhard
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>
>>
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>  
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message