Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 94089 invoked from network); 19 Nov 2007 21:14:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 19 Nov 2007 21:14:45 -0000 Received: (qmail 10944 invoked by uid 500); 19 Nov 2007 21:14:25 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 10904 invoked by uid 500); 19 Nov 2007 21:14:25 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 10880 invoked by uid 99); 19 Nov 2007 21:14:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Nov 2007 13:14:25 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of haroldo.araras@gmail.com designates 64.233.170.190 as permitted sender) Received: from [64.233.170.190] (HELO rn-out-0102.google.com) (64.233.170.190) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Nov 2007 21:14:13 +0000 Received: by rn-out-0102.google.com with SMTP id v46so1206998rnb for ; Mon, 19 Nov 2007 13:14:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=iGK/mtolpDgQ6Fuic8Ct1SdKznzrdx4nm1uJD48NXmM=; b=BorUfErt1h7be/o0XAbiv6pxqTfVG9S/KSvmzc1hHkhAOa9uMO+RHKrL5wRXSo8rYBAJxoRL/U05mpWvMdEgH+BFOBEKmVkHHWhmYP3xgdrJ+YAYyP9gaV5Rjg82M4GrZmd+hetdGncoj1Pi1K0wLk7XpH0UPoiOV9pyuzQOE3o= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=f1AREPPy9rOmaFwD9pPJhNjbuESDiuLMvXYUDA0e1mDgteCvo0tp9sAO+gS1hi43me7ojrMAE9Avw8FKrvAo5ijoYunvDiT/UH3V4Ub/ROmVfS+Kw7vaXWX6Q69jKBiz8VDrSCh7u5mTrxtQZHNbTjLgG/CVETQkPgYWLwtxvFc= Received: by 10.150.135.2 with SMTP id i2mr710936ybd.1195506845982; Mon, 19 Nov 2007 13:14:05 -0800 (PST) Received: by 10.150.138.12 with HTTP; Mon, 19 Nov 2007 13:14:05 -0800 (PST) Message-ID: <718789d20711191314n16570772nad884322b7ba5569@mail.gmail.com> Date: Mon, 19 Nov 2007 18:14:05 -0300 From: "Haroldo Nascimento" To: java-user@lucene.apache.org Subject: Re: Time of processing hits.doc() In-Reply-To: <76df56980711190819o9acd211q798412af04a770d4@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <718789d20711181332l1eeac315v732a5c6438ad0939@mail.gmail.com> <4740C637.2010809@gmail.com> <718789d20711190612p5af71ddesa64dccfced56891e@mail.gmail.com> <718789d20711190805p4b49241epdbdf63ddee32c93b@mail.gmail.com> <76df56980711190819o9acd211q798412af04a770d4@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org German, How would be it ? You have 2 index ?. One for seach main (keyword) and other for location ? You do 2 search, The first is the search main e the second is the search location ,but insert the filter. What type of Filter do use ? I have the bitset of search main (keyword), but I dont know how get the bitset that define the location of advertising of search main. thanks On Nov 19, 2007 1:19 PM, German Kondolf wrote: > I have already defined a Lucene Filter for every "id" of "ubicacion". > I just create the bitset for every value, and count it against the result. > > One possible optimization is to read the terms of the field you're > trying to "group", that's the optimization we'll be working soon on > our app. > > I never read all the results. > > On Nov 19, 2007 1:05 PM, Haroldo Nascimento wrote: > > > German, > > > > What I need is similar to the your site > > http://listados.deremate.com.ar/panaderia . > > I have many results of search, but I show any result (for example: > > first 10 for first page) , but for create the options of filter of > > location I need read all results fof search. The problem of > > performance is when I have 30.000 results. > > > > How you get the filter the "ubication" ? You need read all results ? > > > > thanks > > > > On Nov 19, 2007 12:05 PM, Grant Ingersoll wrote: > > > I think, based on your previous question, that you just need to use > > > the search() method that returns TopDocs, not the lower-level > > > HitCollector method. From the TopDocs, you can then access the > > > ScoreDoc, which will give you info about the doc and the score. See http://www.lucenebootcamp.com/LuceneBootCamp/training/src/test/java/com/lucenebootcamp/training/basic/TopDocsTest.java > > > from my Lucene Boot Camp training class for a really simple example. > > > > > > -Grant > > > > > > > > > > > On Nov 19, 2007, at 9:12 AM, Haroldo Nascimento wrote: > > > > > > > Mark, > > > > > > > > How I can get the information of Document. I think that is in the > > > > implementation do method abstract collect. How I can get it . > > > > > > > > Below is the example of javadoc the Lucene. > > > > > > > > Searcher searcher = new IndexSearcher(indexReader); > > > > final BitSet bits = new BitSet(indexReader.maxDoc()); > > > > searcher.search(query, new HitCollector() { > > > > public void collect(int doc, float score) { > > > > bits.set(doc); > > > > } > > > > }); > > > > > > > > Thanks > > > > > > > > > > > > On Nov 18, 2007 8:09 PM, Mark Miller wrote: > > > >> Hey Haroldo. > > > >> > > > >> First thing you need to do is *stop* using Hits in your searches. > > > >> Hits > > > >> is optimized for some pretty specific use cases and you will get > > > >> along > > > >> much better by using a HitCollector. > > > >> > > > >> Hits has three main functions: > > > >> > > > >> It caches documents, normalizes scores, and stores ids associated > > > >> with > > > >> scores (a HitDoc). If you attempt to retrieve a HitDoc past the first > > > >> 100 from Hits, a new search will be issued to grab double the > > > >> required > > > >> HitDocs needed to satisfy your HitDoc retrieval attempt. This will be > > > >> repeated everytime you ask for a HitDoc beyond the current cache > > > >> (which > > > >> began at 100). This means that if you need to get a HitDoc beyond > > > >> 100, > > > >> Hits is not a great choice for you. You will want to use the > > > >> HitCollector instead...but remember that you are losing the > > > >> normalized > > > >> scores (simple to copy code if you still want it) and the document > > > >> caching (I rarely want that anyway). > > > >> > > > >> An issue to watch out for: with Hits, you do not have to ask for how > > > >> many docs to get back, but with a HitCollector solution you will need > > > >> to. This is a minor dilema if you want to go over all of the hits no > > > >> matter what. You can pass a huge number to ensure you get everything, > > > >> but you will be creating large data structures if you do this, as > > > >> structure sizes may be initialized by the number you pass. Also, > > > >> passing > > > >> the maximum integer will cause an error (negative init size) as > > > >> Lucene > > > >> initializes a data structure to hold the hits as n+1. > > > >> > > > >> - Mark > > > >> > > > >> > > > >> Haroldo Nascimento wrote: > > > >>> I have a problem of performance when I need group the result do > > > >>> search > > > >>> > > > >>> I have the code below: > > > >>> > > > >>> for (int i = 0; i < hits.length(); i++) { > > > >>> doc = hits.doc(i); > > > >>> > > > >>> obj1 = doc.get(Constants.STATE_DESC_FIELD_LABEL); > > > >>> obj2 = doc.get(xxx); > > > >>> ... > > > >>> } > > > >>> > > > >>> I work with volume of data very big. The search process in 0.300 > > > >>> seconds but when the object hits have much results, the time for get > > > >>> all objects is very big. The command hits.doc(i) is processed in 2 > > > >>> second. > > > >>> > > > >>> Por exemplo. For hits.length() equals the 25.000 results, the time > > > >>> of "pos search" is 7 seconds. > > > >>> > > > >>> I get all result because I need group the result (remove the > > > >>> duplicate results). > > > >>> > > > >>> Is there any form in Lucene that group the result. I need of > > > >>> anything as the command "group by" of sql. > > > >>> > > > >>> Thanks. > > > >>> > > > >> > > > >>> --------------------------------------------------------------------- > > > >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > > >>> For additional commands, e-mail: java-user-help@lucene.apache.org > > > >>> > > > >>> > > > >>> > > > >> > > > >> --------------------------------------------------------------------- > > > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > > >> For additional commands, e-mail: java-user-help@lucene.apache.org > > > >> > > > >> > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > > > -------------------------- > > > Grant Ingersoll > > > http://lucene.grantingersoll.com > > > > > > Lucene Helpful Hints: > > > http://wiki.apache.org/lucene-java/BasicsOfPerformance > > > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org