Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 64000 invoked from network); 17 May 2007 13:44:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 17 May 2007 13:44:01 -0000 Received: (qmail 41589 invoked by uid 500); 17 May 2007 13:43:59 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 41554 invoked by uid 500); 17 May 2007 13:43:59 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 41543 invoked by uid 99); 17 May 2007 13:43:59 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 May 2007 06:43:59 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of andreas.guther@gmail.com designates 209.85.132.248 as permitted sender) Received: from [209.85.132.248] (HELO an-out-0708.google.com) (209.85.132.248) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 May 2007 06:43:52 -0700 Received: by an-out-0708.google.com with SMTP id b8so129624ana for ; Thu, 17 May 2007 06:43:31 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=Q/TrxClP+jhG3aJQiiuoOQ8WdAziJn9ERx++wTgNm5ZbLR/OqRd9EtXpubUbzLZtdcNq3pSbRUtl9BdaZcFubhD2O0wigjd4U07moSI+n1hR6fDZxFanhiWGRmbBsm21Lb+BTSHO0TXwJSUTCMmrAR68N2uhDWe7Bhz+7LPIdGI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=tlUIc1/ACFFspwXG2mcwMlOUldcNR2jZjXrQHlY3Td36NE2NMSGwGVa8C9mhg7vzvDiNDAEpQojwOHJ78+XOaNuag3s+iIOAtuhuisKti3OfqodSZ3U+0rlhXTopzysv+zt2y26f2swDEVT2LDp1W3IL2kUilbvbxiBO28oTP/4= Received: by 10.100.253.12 with SMTP id a12mr279157ani.1179409410411; Thu, 17 May 2007 06:43:30 -0700 (PDT) Received: by 10.100.43.4 with HTTP; Thu, 17 May 2007 06:43:30 -0700 (PDT) Message-ID: Date: Thu, 17 May 2007 06:43:30 -0700 From: "Andreas Guther" To: java-user@lucene.apache.org Subject: Re: Field.Store.Compress - does it improve performance of document reads? In-Reply-To: <9FABF4D2-AB41-4D10-98D3-E5C02005B85E@apache.org> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_42068_23504604.1179409410340" References: <200705171001.45912.paul.elschot@xs4all.nl> <9FABF4D2-AB41-4D10-98D3-E5C02005B85E@apache.org> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_42068_23504604.1179409410340 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline I am actually using the FieldSelector and unless I did something wrong it did not provide me any load performance improvements which was surprising to me and disappointing at the same time. The only difference I could see was when I returned for all fields a NO_LOAD which from my understanding is the same as skipping over the document. Right now I am looking into fragmentation problems of my huge index files. I am de-fragmenting the hard drive to see if this brings any read performance improvements. I am also wondering if the FieldCache as discussed in http://www.gossamer-threads.com/lists/lucene/general/28252 would help improve the situation. Andreas On 5/17/07, Grant Ingersoll wrote: > > I haven't tried compression either. I know there was some talk a > while ago about deprecating, but that hasn't happened. The current > implementation yields the highest level of compression. You might > find better results by compressing in your application and storing as > a binary field, thus giving you more control over CPU used. This is > our current recommendation for dealing w/ compression. > > If you are not actually displaying that field, you should look into > the FieldSelector API (via IndexReader). It allows you to lazily > load fields or skip them all together and can yield a pretty > significant savings when it comes to loading documents. > FieldSelector is available in 2.1. > > -Grant > > On May 17, 2007, at 4:01 AM, Paul Elschot wrote: > > > On Thursday 17 May 2007 08:10, Andreas Guther wrote: > >> I am currently exploring how to solve performance problems I > >> encounter with > >> Lucene document reads. > >> > >> We have amongst other fields one field (default) storing all > >> searchable > >> fields. This field can become of considerable size since we are > >> indexing > >> documents and store the content for display within results. > >> > >> I noticed that the read can be very expensive. I wonder now if it > >> would > >> make sense to add this field as Field.Store.Compress to the > >> index. Can > >> someone tell me if this would speed up the document read or if > >> this is > >> something only interesting for saving space. > > > > I have not tried the compression yet, but in my experience a good way > > to reduce the costs of document reads from a disk is by reading them > > in document number order whenever possible. In this way one saves > > on the disk head seeks. > > Compression should actually help reducing the costs of disk head seeks > > even more. > > > > Regards, > > Paul Elschot > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > -------------------------- > Grant Ingersoll > Center for Natural Language Processing > http://www.cnlp.org/tech/lucene.asp > > Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ > LuceneFAQ > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > ------=_Part_42068_23504604.1179409410340--