Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 30776 invoked from network); 7 Sep 2005 19:49:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 7 Sep 2005 19:49:25 -0000 Received: (qmail 21870 invoked by uid 500); 7 Sep 2005 19:49:22 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 21832 invoked by uid 500); 7 Sep 2005 19:49:22 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 21818 invoked by uid 99); 7 Sep 2005 19:49:22 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Sep 2005 12:49:22 -0700 X-ASF-Spam-Status: No, hits=0.1 required=10.0 tests=HTML_30_40,HTML_MESSAGE,RCVD_BY_IP,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of haipengdu@gmail.com designates 64.233.162.201 as permitted sender) Received: from [64.233.162.201] (HELO zproxy.gmail.com) (64.233.162.201) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Sep 2005 12:49:34 -0700 Received: by zproxy.gmail.com with SMTP id i28so918124nzi for ; Wed, 07 Sep 2005 12:49:19 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=WZf53/Aa8gsUm0SzPWr0DUrxKZtv9FA+DQtpmzEUUvozOG4YMxrfn4ImTS4QbVtkZC2amX+k+Xku+wYN/T3DJKaDBqsdEYGzRt/itPQ/zQUgLoTyzM0dP7+Z+CIMY/zXc8YBsFYJ+xjzufIhv7WJeTTHLGtuB+3kRnAFaG3IRc0= Received: by 10.37.2.48 with SMTP id e48mr1588395nzi; Wed, 07 Sep 2005 12:49:18 -0700 (PDT) Received: by 10.36.3.10 with HTTP; Wed, 7 Sep 2005 12:49:18 -0700 (PDT) Message-ID: Date: Wed, 7 Sep 2005 13:49:18 -0600 From: haipeng du To: java-dev@lucene.apache.org Subject: Re: limit return results In-Reply-To: <66E70FEE-D273-4837-BB86-60DA5A24382D@ehatchersolutions.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_2084_8897528.1126122558770" References: <66E70FEE-D273-4837-BB86-60DA5A24382D@ehatchersolutions.com> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_Part_2084_8897528.1126122558770 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline That is just my concern because my big number of documents I will have(at= =20 least 4 million documents). On 9/7/05, Erik Hatcher wrote: >=20 > Have you seen out of memory problems? Or are you being preemptive in > your concerns? >=20 > Erik >=20 > On Sep 7, 2005, at 11:58 AM, haipeng du wrote: >=20 > > The reason that I want to limit returned result is that I do not want > > to get out of memory problem. I index lucene with 3 million documents. > > Sometimes, searching will return millions of fields back to me. I just > > want to get the first 100, for example , to show them to user. Even, I > > use search(query,filter,topDocs), I believe it still return all > > results > > back. So how could I limit the lucene returning? > > > > On 9/7/05, M.Altheim wrote: > > > >> > >> > >> Erik Hatcher [mailto:erik@ehatchersolutions.com] wrote: > >> > >>> > >>> On Sep 6, 2005, at 10:47 PM, Murray Altheim wrote: > >>> > >>> > >>>> Erik Hatcher wrote: > >>>> > >>>> > >>>>> Just access the first 100 Hits - simple as that. > >>>>> Erik > >>>>> > >>>> > >>>> Erik, > >>>> > >>>> This question has come up before. For high traffic sites that > >>>> can't afford to have the search engine accumulating thousands > >>>> of hits, only to deliver 100, or perhaps just a few, the > >>>> current approach *seems* like quite a lot of extra processing. > >>>> Is there some way to have the engine simply stop generating > >>>> the hit list after it reaches the specified threshold? > >>>> > >>> > >>> The operator word here is "seems". Do you have any evidence that > >>> doing a basic .search(Query) and only getting the first 100 results > >>> is too slow? > >>> > >>> The HitCollector option that Otis mentioned is one alternative, > >>> though I don't think it'll be much, if any, faster. > >>> > >> > >> Erik, > >> > >> Evidence, no. I'm looking at this from the perspective of the > >> Open University, where we have over 200,000 students accessing > >> and searching our online services. Anything that can minimize > >> the impact on our processors is going to be most welcome, i.e., > >> we don't have cycles to waste. If the student is only expecting > >> the first 10 results and the engine generates 1000, 990 of them > >> are wasted. > >> > >> Murray > >> > >> ..................................................................... > >> . > >> Murray Altheim http://www.altheim.com/murray/ > >> Strategic & Service Development > >> The Open University Library > >> Milton Keynes, Bucks, MK7 6AA, UK > >> > >> Ils ont l'orteil de Bouc, & d'un Chevreil l'oreille, > >> La corne d'un Chamois, & la face vermeille > >> Comme un rouge Croissant: & dancent toute nuict > >> Dedans un carrefour, ou pres d'une eau qui bruict. > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > >> For additional commands, e-mail: java-dev-help@lucene.apache.org > >> > >> > >> > > > > > > -- > > Haipeng Du > > Software Engineer > > Comphealth, > > Salt Lake City > > >=20 >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org >=20 >=20 --=20 Haipeng Du Software Engineer Comphealth,=20 Salt Lake City ------=_Part_2084_8897528.1126122558770--