Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 71641 invoked from network); 8 Apr 2007 20:53:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 8 Apr 2007 20:53:01 -0000 Received: (qmail 57287 invoked by uid 500); 8 Apr 2007 20:53:02 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 57152 invoked by uid 500); 8 Apr 2007 20:53:02 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 57141 invoked by uid 99); 8 Apr 2007 20:53:01 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 08 Apr 2007 13:53:01 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of erickerickson@gmail.com designates 209.85.134.187 as permitted sender) Received: from [209.85.134.187] (HELO mu-out-0910.google.com) (209.85.134.187) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 08 Apr 2007 13:52:54 -0700 Received: by mu-out-0910.google.com with SMTP id i10so2043387mue for ; Sun, 08 Apr 2007 13:52:32 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=nnFTbP61bEL/Ck5r+ylTC53vMc2HigrDKpmAtqG+f/eRgNX5gTqLlWHjmLbr5CTC6iELeAs+s5NDzvL57CkgoAQwxg5T3AR1aGAD49slw0VxLEDfiA5fQxqWo2R6n3mTpUavjjii92hqbSWLtQOSAIDVVObPMikydAXaH1iDqRs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=bjKm2u8LCwV63ydYzt6815SZI5Hvufv22e0KAQGPXudWIvX1FyvLeIXIsxsU5P99LssoMAZkduEgCIgynK9u752OnsLfjj6GoQWsBVX8SQk+f7MKegZjL087ip7+nLOKPWzZd0Lkh7ocVxInA1CGsf9n7GaiPR39eDcxxJsdVtA= Received: by 10.82.120.14 with SMTP id s14mr6938768buc.1176065551370; Sun, 08 Apr 2007 13:52:31 -0700 (PDT) Received: by 10.82.189.3 with HTTP; Sun, 8 Apr 2007 13:52:31 -0700 (PDT) Message-ID: <359a92830704081352i4c393868xeb9705547df3267d@mail.gmail.com> Date: Sun, 8 Apr 2007 16:52:31 -0400 From: "Erick Erickson" To: java-user@lucene.apache.org Subject: Re: Re[2]: Out of memory exception for big indexes In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_16501_8231620.1176065551323" References: <46162A72.50400@sirma.bg> <359a92830704060730l76d38d22g44be77aa3bff3845@mail.gmail.com> <872e2d490704061203k6923dbd2rd6ffe0117a46837@mail.gmail.com> <814127511.20070408213229@gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_16501_8231620.1176065551323 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline It *is* a bit confusing, since every search is sorted, kinda.... Practically, a sorted query is one where you call one of the search methods (on, say, Searcher) with a Sort object, which sorts on one or more of the fields in your index (which ones are used are specified in the (array of) Sort objects). Searches that do NOT have a Sort object default to using relevance ranking, which is not nearly so memory-intensive. This is, after all, one float or so.... The difference is that the fields referenced in the Sort object have to be read into memory and compared against all other values, and the aggregate may be quite large memory-wise. Erick On 4/8/07, Nilesh Bansal wrote: > > On 4/8/07, Artem wrote: > > I must note that my patch only helps in lucene-OOM situations related to > > _sorted_ queries. If this is your case than I think yes it will help. > Probably a newbie question, but can you please explain what sorted > queries mean? Is simple keyword search a sorted query? > > > In my app currently index is not so big, only 1mln docs. With the patch > applied > > sample query giving first 30 of 120,000 sorted results made memory > consumption > > jump from 18M to 20M according to jconsole. > > > > NB> It seems that there are some issues with this patch and that was the > > NB> reason it is not yet in the main source tree. Can someone please > > NB> summerize what are the downsides of using such an approach. It will > be > > NB> really good if Lucene had it in main source tree and a flag to turn > ON > > NB> or OFF this feature. > > > > First there's performance cost (for second and further queries with the > > same IndexSearcher). In default implementation all the index values of > sorted > > field are cached during the first sorted search - this takes memory and > time; > > but next queries run fast if there still some memory left. My > implementation > > doesn't cache field values but loads them from respective documents on > the fly - > > so it's slower but takes less memory. The query mentioned took about 3s > (with > > rather small sorted fields values - about 20-100 chars). > > There's a limitation also - my implementation requires sorted field to > be > > "stored" in index (Field.Store.YES in doc.add()) > > > > NB> Bublic, can you tell me what exactly I need to do if I want to use > this patch? > > > > You can include StoredFieldSortFactory class source file into your > sources and > > then use StoredFieldSortFactory.create(sortFieldName, sortDescending) to > get > > Sort object for sorting query. > > StoredFieldSortFactory source file can be extracted from LUCENE-769 > patch or > > from sharehound sources: > http://sharehound.cvs.sourceforge.net/*checkout*/sharehound/jNetCrawler/src/java/org/apache/lucene/search/StoredFieldSortFactory.java > > > > Regards, > > Artem > > > > NB> thanks > > NB> Nilesh > > > > NB> On 4/6/07, Bublic Online wrote: > > >> Hi Ivan, Chris and all! > > >> > > >> I'm that contributor of LUCENE-769 and I recommend it too :) > > >> OutOfMemory error was one of main reasons for me to make it. > > >> > > >> Regards, > > >> Artem Vasiliev > > >> > > >> On 4/6/07, Chris Hostetter wrote: > > >> > > > >> > > > >> > : The problem I suspect is the sorting. As I understand, Lucene > > >> > : builds internal caches for sorting and I suspect that this is the > root > > >> > : of your problem. You can test this by trying your problem queries > > >> > : without sorting. > > >> > > > >> > if Sorting really is the cause of your problems, you may want to > try out > > >> > this patch... > > >> > > > >> > https://issues.apache.org/jira/browse/LUCENE-769 > > >> > > > >> > ...it *may* be advantageous in situations where memory is your most > > >> > constrained resource, and you are willing to sacrifice speed for > sorting > > >> > ... it looks promising to me, but there haven't been any convincing > > >> > usecases/benchmarks of people finding it beneficial (other then the > > >> > original contributor) > > >> > > > >> > if you do try it, please post your comments in the issue. > > >> > > > >> > > > >> > > > >> > -Hoss > > >> > > > >> > > > >> > > --------------------------------------------------------------------- > > >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > >> > For additional commands, e-mail: java-user-help@lucene.apache.org > > >> > > > >> > > > >> > > > > > > > > > > > > -- > > Best regards, > > Artem mailto:abublic@gmail.com > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > -- > Nilesh Bansal. > http://queens.db.toronto.edu/~nilesh/ > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > ------=_Part_16501_8231620.1176065551323--