Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 43674 invoked from network); 10 Aug 2006 12:42:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 10 Aug 2006 12:42:06 -0000 Received: (qmail 88698 invoked by uid 500); 10 Aug 2006 12:42:00 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 88439 invoked by uid 500); 10 Aug 2006 12:41:59 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 88428 invoked by uid 99); 10 Aug 2006 12:41:59 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Aug 2006 05:41:59 -0700 X-ASF-Spam-Status: No, hits=1.1 required=10.0 tests=DNS_FROM_RFC_ABUSE,HTML_00_10,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of jason.polites@gmail.com designates 64.233.182.190 as permitted sender) Received: from [64.233.182.190] (HELO nf-out-0910.google.com) (64.233.182.190) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Aug 2006 05:41:58 -0700 Received: by nf-out-0910.google.com with SMTP id p48so563894nfa for ; Thu, 10 Aug 2006 05:41:37 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:mime-version:content-type; b=GkQEmp3o99eCdLBaSn1vLIvl1J6MGSTW2hBLnCOOKHjtDZhO2ztpltrTAIOuo/8ErrszRq222GydRf0LIZo/WS9dxXeONFiCIC0APxhFDUSKRH2FrxQj+69e60Rc4NEp2iED2Ygi+FB5bCs8B7+cFdjWkwr77mYL91u81gSJ3h8= Received: by 10.82.132.4 with SMTP id f4mr270689bud; Thu, 10 Aug 2006 05:41:36 -0700 (PDT) Received: by 10.82.127.16 with HTTP; Thu, 10 Aug 2006 05:41:36 -0700 (PDT) Message-ID: Date: Thu, 10 Aug 2006 22:41:36 +1000 From: "Jason Polites" Reply-To: jason.polites@synetek.com To: java-user@lucene.apache.org Subject: Field compression too slow MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_3448_9554549.1155213696849" X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_Part_3448_9554549.1155213696849 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Hello all, I am experiencing some performance problems indexing large(ish) amounts of text using the IndexField.Store.COMPRESS option when creating a Field in Lucene. I have a sample document which has about 4.5MB of text to be stored as compressed data within the field, and the indexing of this document seems to take an inordinate amount of time (over 10 minutes!). When debugging I can see that it's stuck on the deflate() calls of the Deflater used by Lucene. I noted that Lucene by default uses the Deflater.BEST_COMPRESSIONcompression level when encountering a compressed field. I'm not sure if it would help my particular situation, but is there any way to provide the option of specifying the compression level? The level used by Lucene (level 9) is the maximum possible compression level. Ideally I would like to be able to alter the compression level on the basis of the field size. This way I can smooth out the compression times across the various document sizes. I am more interested in consistent time than I am consistent compression. Or... could there some other reason my document takes this long to index? (and hold up all other threads). Thanks. ------=_Part_3448_9554549.1155213696849--