lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eks dev <eks...@yahoo.co.uk>
Subject Re: Lucene Compression
Date Wed, 02 Apr 2008 14:10:08 GMT
the example you have sent is too small for the type of compression implemented in lucene. The
problem is that you have to store decoding symbol table , header ...* for each* document you
compress.  
The best you can do for this would be to use some compressor with static decoding table (some
entropy codec, huffman or so) and maybe some static dictionary compressor. As Grant said,
the best way to do it would be to compress your field before storing it into lucene

----- Original Message ----
> From: Grant Ingersoll <gsingers@apache.org>
> To: java-user@lucene.apache.org
> Sent: Wednesday, 2 April, 2008 2:09:37 PM
> Subject: Re: Lucene Compression
> 
> It's generally considered best practice to compress things first in  
> your app and then add them as a binary field.   That being said, I  
> don't see why that would blow up on it's own.  Have you tried  
> compressing it outside of Lucene to see what happens?  If you can  
> reproduce it as a test case for Lucene, that would be great.
> 
>  From FieldsWriter, Lucene's compression code looks like:
> private final byte[] compress (byte[] input) {
> 
>        // Create the compressor with highest level of compression
>        Deflater compressor = new Deflater();
>        compressor.setLevel(Deflater.BEST_COMPRESSION);
> 
>        // Give the compressor the data to compress
>        compressor.setInput(input);
>        compressor.finish();
> 
>        /*
>         * Create an expandable byte array to hold the compressed data.
>         * You cannot use an array that's the same size as the orginal  
> because
>         * there is no guarantee that the compressed data will be  
> smaller than
>         * the uncompressed data.
>         */
>        ByteArrayOutputStream bos = new  
> ByteArrayOutputStream(input.length);
> 
>        // Compress the data
>        byte[] buf = new byte[1024];
>        while (!compressor.finished()) {
>          int count = compressor.deflate(buf);
>          bos.write(buf, 0, count);
>        }
> 
>        compressor.end();
> 
>        // Get the compressed data
>        return bos.toByteArray();
>      }
> 
> 
> There is an interesting comment in that code about how the compressed  
> data won't necessarily be smaller, so maybe you have entered the  
> compression twilight zone.
> 
> HTH
> -Grant
> 
> 
> On Apr 2, 2008, at 12:51 AM, Sebastin wrote:
> 
> >
> > Hi All,
> >       is there any possibility to create compression store for the
> > following types of string in lucene index store?
> >
> >
> > String str = "II0264.D05|00022745|ABCDE|03/01/2008 00:23:12|00035|
> > 9840836588| 129382152520| 04F4243B600408|04F4243B600408|
> > |11919898456123|354943011025810L| "CPTBS2I"| "ABCD3E"|11| 
> > 1234510003243219I|"
> >
> >
> > I try to store these fields as Field.Store.COMPRESSION  but it  
> > exceeds the
> > original size of the file?
> >
> >
> > -- 
> > View this message in context: 
> http://www.nabble.com/Lucene-Compression-tp16442112p16442112.html
> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> --------------------------
> Grant Ingersoll
> http://www.lucenebootcamp.com
> Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
> 
> Lucene Helpful Hints:
> http://wiki.apache..org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
> 
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 




      __________________________________________________________
Sent from Yahoo! Mail.
A Smarter Inbox http://uk.docs.yahoo.com/nowyoucan.html


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message