lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <>
Subject Re: [jira] Commented: (LUCENE-648) Allow changing of ZIP compression level for compressed fields
Date Fri, 11 Aug 2006 05:45:53 GMT
I don't understand why the compressed fields are not just handled  
externally in the Document class - just add uncompress/compress  
methods. This way all Lucene needs to understand is binary fields,  
and you don't have any of these problems during merging or initial  

On Aug 11, 2006, at 12:18 AM, Michael Busch (JIRA) wrote:

>     [ 
> page=comments#action_12427421 ]
> Michael Busch commented on LUCENE-648:
> --------------------------------------
> I think the compression level is only one part of the performance  
> problem. Another drawback of the current implementation is how  
> compressed fields are being merged: the FieldsReader uncompresses  
> the fields, the SegmentMerger concatenates them and the  
> FieldsWriter compresses the data again. The uncompress/compress  
> steps are completely unnecessary and result in a large overhead.  
> Before a document is written to the disk, the data of its fields is  
> even being compressed twice. Firstly, when the DocumentWriter  
> writes the single-document segment to the RAMDirectory, secondly,  
> when the SegmentMerger merges the segments inside the RAMDirectory  
> to write the merged segment to the disk.
> Please checkout Jira Issue 629 ( 
> browse/LUCENE-629), where I recently posted a patch that fixes this  
> problem and increases the indexing speed significantly. I also  
> included some performance test results which quantify the  
> improvement. Mike, it would be great if you could also try out the  
> patched version for your tests with the compression level.
>> Allow changing of ZIP compression level for compressed fields
>> -------------------------------------------------------------
>>                 Key: LUCENE-648
>>                 URL:
>>             Project: Lucene - Java
>>          Issue Type: Improvement
>>          Components: Index
>>    Affects Versions: 2.0.0, 1.9, 2.0.1, 2.1
>>            Reporter: Michael McCandless
>>            Priority: Minor
>> In response to this thread:
>> I think we should allow changing the compression level used in the  
>> call to in  Right now  
>> it's hardwired to "best":
>>       compressor.setLevel(Deflater.BEST_COMPRESSION);
>> Unfortunately, this can apparently cause the zip library to take a  
>> very long time (10 minutes for 4.5 MB in the above thread) and so  
>> people may want to change this setting.
>> One approach would be to read the default from a Java system  
>> property, but, it seems recently (pre 2.0 I think) there was an  
>> effort to not rely on Java System properties (many were removed).
>> A second approach would be to add static methods (and static class  
>> attr) to globally set the compression level?
>> A third method would be in document.Field class, eg a  
>> setCompressLevel/getCompressLevel?  But then every time a document  
>> is created with this field you'd have to call setCompressLevel  
>> since Lucene doesn't have a global Field schema (like Solr).
>> Any other ideas / prefererences for either of these methods?
> -- 
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the  
> administrators: 
> Administrators.jspa
> -
> For more information on JIRA, see: 
> software/jira
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message