Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 90682 invoked from network); 11 Aug 2006 05:21:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 11 Aug 2006 05:21:12 -0000 Received: (qmail 50781 invoked by uid 500); 11 Aug 2006 05:21:11 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 50158 invoked by uid 500); 11 Aug 2006 05:21:10 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 50142 invoked by uid 99); 11 Aug 2006 05:21:10 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Aug 2006 22:21:10 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [209.237.227.198] (HELO brutus.apache.org) (209.237.227.198) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Aug 2006 22:21:07 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 0B7EC7142C7 for ; Fri, 11 Aug 2006 05:18:15 +0000 (GMT) Message-ID: <14516344.1155273495044.JavaMail.jira@brutus> Date: Thu, 10 Aug 2006 22:18:15 -0700 (PDT) From: "Michael Busch (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-648) Allow changing of ZIP compression level for compressed fields In-Reply-To: <7189075.1155222434115.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/LUCENE-648?page=comments#action_12427421 ] Michael Busch commented on LUCENE-648: -------------------------------------- I think the compression level is only one part of the performance problem. Another drawback of the current implementation is how compressed fields are being merged: the FieldsReader uncompresses the fields, the SegmentMerger concatenates them and the FieldsWriter compresses the data again. The uncompress/compress steps are completely unnecessary and result in a large overhead. Before a document is written to the disk, the data of its fields is even being compressed twice. Firstly, when the DocumentWriter writes the single-document segment to the RAMDirectory, secondly, when the SegmentMerger merges the segments inside the RAMDirectory to write the merged segment to the disk. Please checkout Jira Issue 629 (http://issues.apache.org/jira/browse/LUCENE-629), where I recently posted a patch that fixes this problem and increases the indexing speed significantly. I also included some performance test results which quantify the improvement. Mike, it would be great if you could also try out the patched version for your tests with the compression level. > Allow changing of ZIP compression level for compressed fields > ------------------------------------------------------------- > > Key: LUCENE-648 > URL: http://issues.apache.org/jira/browse/LUCENE-648 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 2.0.0, 1.9, 2.0.1, 2.1 > Reporter: Michael McCandless > Priority: Minor > > In response to this thread: > http://www.gossamer-threads.com/lists/lucene/java-user/38810 > I think we should allow changing the compression level used in the call to java.util.zip.Deflator in FieldsWriter.java. Right now it's hardwired to "best": > compressor.setLevel(Deflater.BEST_COMPRESSION); > Unfortunately, this can apparently cause the zip library to take a very long time (10 minutes for 4.5 MB in the above thread) and so people may want to change this setting. > One approach would be to read the default from a Java system property, but, it seems recently (pre 2.0 I think) there was an effort to not rely on Java System properties (many were removed). > A second approach would be to add static methods (and static class attr) to globally set the compression level? > A third method would be in document.Field class, eg a setCompressLevel/getCompressLevel? But then every time a document is created with this field you'd have to call setCompressLevel since Lucene doesn't have a global Field schema (like Solr). > Any other ideas / prefererences for either of these methods? -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org