Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 65787 invoked from network); 25 Aug 2010 08:33:06 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 25 Aug 2010 08:33:06 -0000 Received: (qmail 50031 invoked by uid 500); 25 Aug 2010 08:33:03 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 49343 invoked by uid 500); 25 Aug 2010 08:32:59 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 49265 invoked by uid 99); 25 Aug 2010 08:32:58 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Aug 2010 08:32:58 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ian.lea@gmail.com designates 209.85.216.48 as permitted sender) Received: from [209.85.216.48] (HELO mail-qw0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Aug 2010 08:32:37 +0000 Received: by qwk3 with SMTP id 3so305622qwk.35 for ; Wed, 25 Aug 2010 01:32:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=6RZ4/QZZ7U1hH9gv5+T2UYqCESQPHMVQSPZB2nAfEdA=; b=YniEniIhnB3tbsI2GRkcPMM2FMhX2eJ77KstoH8S+OZu88SdKdlSr8IQJ+ngxyYOqS 4zVLaaMK3rem7U3a+7N/F5sUtRj89MPw8QWKOiIF9TlX4MsPwcPlEO890KW4WF7PBaU4 t/+aFvpVcRTCHXgHdD3YITEK9+/rP/LgDuVfU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=bC3WRQ8C3PFdcPyppNkedzfsK+QwRLpvcSLdoEugtbwASwxpru7/sHOIBTICqgoyVr 55WFbqjl7AmEf8VtdhQJ4GuYrw4kcdQExilUyBDpVkJg5F6ag+ZG2lTzq2zmW4dWLr2k pbB26ujyzMTnzlTcy6Qm2mzefHpx1JEawwCyU= Received: by 10.224.19.205 with SMTP id c13mr2238531qab.111.1282725136155; Wed, 25 Aug 2010 01:32:16 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.223.132 with HTTP; Wed, 25 Aug 2010 01:31:55 -0700 (PDT) In-Reply-To: References: From: Ian Lea Date: Wed, 25 Aug 2010 09:31:55 +0100 Message-ID: Subject: Re: Sorting a Lucene index To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org 1 billion i.e. 1,000,000,000? Either buy more RAM, lots more RAM, or skip lucene sorting and do your own sorting for the top n hits. You might also want to look into sharding/distributing your index. -- Ian. On Wed, Aug 25, 2010 at 6:16 AM, Shelly_Singh wr= ote: > I have 1 bln documents to sort. So, that would mean ( 8 bln bytes =3D=3D = 8GB RAM) bytes. > All I have is 8 GB on my machine, so I do not think approach would work. > > Any other options? > > -----Original Message----- > From: Erick Erickson [mailto:erickerickson@gmail.com] > Sent: Thursday, August 19, 2010 7:18 PM > To: java-user@lucene.apache.org > Subject: Re: Sorting a Lucene index > > You haven't yet told us how many documents you're talking about here, so > it's > hard to have a good idea of what solutions are. That said, I'd just try > sorting first. > The sorting cache size will be something like (sizeof(int or long)) * > (number of documents). > Measure (remember to measure the response after query warmups, the first > few will be slower because they fill up the cache) THEN fix iff there's a > problem. > > And just forget the idea of inserting your documents in the correct order > . You > stated that the documents come in in random order. Document IDs are assig= ned > internally to Lucene, and monotonically increasing. I sure don't see how = you > can > reconcile those two things... > > But again, just try it with sorting on the numeric field and only fix thi= ngs > if you have a > problem. Lots of work has been put into making Lucene fast, by very brigh= t > people. See > if they've already solved your problem for you... > > Best > Erick. > > > On Thu, Aug 19, 2010 at 1:51 AM, Shelly_Singh w= rote: > >> Hi Anshum, >> >> I require sorted results for all my queries and the field on which I nee= d >> sorting is fixed; so this lead to me the idea of storing in sorted order= to >> avoid sorting cost with every query. >> >> Thanks and Regards, >> >> Shelly Singh >> Center For KNowledge Driven Information Systems, Infosys >> Email: shelly_singh@infosys.com >> Phone: (M) 91 992 369 7200, (VoIP)2022978622 >> >> -----Original Message----- >> From: Anshum [mailto:anshumg@gmail.com] >> Sent: Wednesday, August 18, 2010 5:21 PM >> To: java-user@lucene.apache.org >> Subject: Re: Sorting a Lucene index >> >> Hi Shelly, >> The search results so returned are sorted either by relevance, index ord= er, >> stored field, or custom order. >> As you are saying that you would not be able to maintain the index order= , >> =A0you would have to do the sort at run time. >> Sorting on a stored field is not costly and you may use it comfortably. >> btw, >> are you facing any issues in sort time or is it a presumption? >> >> -- >> Anshum Gupta >> http://ai-cafe.blogspot.com >> >> >> On Wed, Aug 18, 2010 at 5:12 PM, Shelly_Singh > >wrote: >> >> > Hi, >> > >> > I have a Lucene index that contains a numeric field along with certain >> > other fields. The order of incoming documents is random and >> un-predictable. >> > As a result, while creating an index, I end up adding docs in random >> order >> > with respect to the numeric field value. >> > >> > For example, documents may be added in following order: >> > 12,y,d >> > 100,o,p >> > 1,x,y >> > 23,u,i >> > 31,v,m >> > 22,b,m >> > 109,k,l >> > >> > My requirement is that at search time, I want the documents in order o= f >> the >> > numeric field. >> > One, option is to do a score/sort on the numeric field. >> > But, this may be a costly operation. >> > >> > Hence, I am trying to find if there is some way, such that, my stored >> index >> > is sorted by itself. >> > >> > Please help. >> > >> > Thanks and Regards, >> > >> > Shelly Singh >> > Center For KNowledge Driven Information Systems, Infosys >> > Email: shelly_singh@infosys.com >> > Phone: (M) 91 992 369 7200, (VoIP)2022978622 >> > >> > >> > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org