lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Lucene Compression
Date Wed, 02 Apr 2008 12:09:37 GMT
It's generally considered best practice to compress things first in  
your app and then add them as a binary field.   That being said, I  
don't see why that would blow up on it's own.  Have you tried  
compressing it outside of Lucene to see what happens?  If you can  
reproduce it as a test case for Lucene, that would be great.

 From FieldsWriter, Lucene's compression code looks like:
private final byte[] compress (byte[] input) {

       // Create the compressor with highest level of compression
       Deflater compressor = new Deflater();
       compressor.setLevel(Deflater.BEST_COMPRESSION);

       // Give the compressor the data to compress
       compressor.setInput(input);
       compressor.finish();

       /*
        * Create an expandable byte array to hold the compressed data.
        * You cannot use an array that's the same size as the orginal  
because
        * there is no guarantee that the compressed data will be  
smaller than
        * the uncompressed data.
        */
       ByteArrayOutputStream bos = new  
ByteArrayOutputStream(input.length);

       // Compress the data
       byte[] buf = new byte[1024];
       while (!compressor.finished()) {
         int count = compressor.deflate(buf);
         bos.write(buf, 0, count);
       }

       compressor.end();

       // Get the compressed data
       return bos.toByteArray();
     }


There is an interesting comment in that code about how the compressed  
data won't necessarily be smaller, so maybe you have entered the  
compression twilight zone.

HTH
-Grant


On Apr 2, 2008, at 12:51 AM, Sebastin wrote:

>
> Hi All,
>       is there any possibility to create compression store for the
> following types of string in lucene index store?
>
>
> String str = "II0264.D05|00022745|ABCDE|03/01/2008 00:23:12|00035|
> 9840836588| 129382152520| 04F4243B600408|04F4243B600408|
> |11919898456123|354943011025810L| "CPTBS2I"| "ABCD3E"|11| 
> 1234510003243219I|"
>
>
> I try to store these fields as Field.Store.COMPRESSION  but it  
> exceeds the
> original size of the file?
>
>
> -- 
> View this message in context: http://www.nabble.com/Lucene-Compression-tp16442112p16442112.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message