Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 99653 invoked from network); 30 Mar 2005 23:16:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 30 Mar 2005 23:16:43 -0000 Received: (qmail 49076 invoked by uid 500); 30 Mar 2005 23:16:39 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 48882 invoked by uid 500); 30 Mar 2005 23:16:39 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 48868 invoked by uid 99); 30 Mar 2005 23:16:39 -0000 X-ASF-Spam-Status: No, hits=0.4 required=10.0 tests=DNS_FROM_RFC_ABUSE,RCVD_BY_IP,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: domain of antony.sequeira@gmail.com designates 64.233.184.199 as permitted sender) Received: from wproxy.gmail.com (HELO wproxy.gmail.com) (64.233.184.199) by apache.org (qpsmtpd/0.28) with ESMTP; Wed, 30 Mar 2005 15:16:37 -0800 Received: by wproxy.gmail.com with SMTP id 36so416450wra for ; Wed, 30 Mar 2005 15:16:35 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=UVqMKsnO5QNF4Id7b/CYKEcl/K3JkoNd2k2ge2oDrUD7H1Jkf07SsGyVAu7tOdo3rRJZQom2t8yhMbMYWgLMH37BalIOGp5nrn2BZT5/wCPXFw3ihwi9WmbEBsuuYxYb45BoYte1/Q8ceqIUV8o9TnvDmDIy0zD5m+VNLei5EY8= Received: by 10.54.8.48 with SMTP id 48mr316221wrh; Wed, 30 Mar 2005 15:16:35 -0800 (PST) Received: by 10.54.8.54 with HTTP; Wed, 30 Mar 2005 15:16:34 -0800 (PST) Message-ID: <6fb33c1505033015161e2508e1@mail.gmail.com> Date: Wed, 30 Mar 2005 15:16:34 -0800 From: Antony Sequeira Reply-To: Antony Sequeira To: java-user@lucene.apache.org Subject: Re: pre computing possible search results narrowing and hit counts on those In-Reply-To: <424AE508.6030301@apache.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit References: <6fb33c15050329160136d91ee2@mail.gmail.com> <424AE508.6030301@apache.org> X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N On Wed, 30 Mar 2005 09:42:32 -0800, Doug Cutting wrote: > Antony Sequeira wrote: > > A user does a search for say "condominium", and i show him the 50,000 > > properties that meet that description. > > > > I need two other pieces of information for display - > > 1. I want to show a "select" box on the UI, which contains all the > > cities that appear in those 50,000 documents > > 2. Against each city I want to show the count of matching documents. > > > > For example the drop down might look like > > "Los Angeles" 10000 > > "San Francisco" 5000 > > > > (But, I do not want to show "San Jose" if none of the 50,000 documents > > contain it) > > You can use the FieldCache & HitCollector: > > private class Count { int value; } > > String[] docToCity = FieldCache.getStrings(indexReader, "city"); > Map cityToCount = new HashMap(); > > searcher.search(query, new HitCollector() { > public void collect(int doc, float score) { > String city = docToCity[doc]; > Count count = cityToCount.get(city); > if (count == null) { > count = new Count(); > cityToCount.put(city, count); > } > count.value++; > } > }); > > // sort & display entries in cityToCount > > Doug > Based on a previous reply , I went through the java docs and came up with public class PreFilterCollector extends HitCollector { final BitVector bits = new BitVector(reader.maxDoc()); java.util.HashMap statemap = new java.util.HashMap() ; public void collect(int id, float score) { bits.set(id); } public java.util.HashMap getStateCounts() { try { int k = bits.size(); int j = 0; for (int i =0; i < k; i++) { if (!bits.get(i)) continue; Document doc = reader.document(i); j++; String state = doc.get("state"); // we assume one state for now if (statemap.containsKey(state)) { statemap.put(state,statemap.get(state) + 1); } else { statemap.put(state,1); } } } catch (Exception e) { throw new RuntimeException(e); } return statemap; } } But, I have the following questions 1. My code first collects all the doc ids and then iterates over them to collect field info. I did this becasue, http://lucene.apache.org/java/docs/api/org/apache/lucene/search/HitCollector.html says "This is called in an inner search loop. For good search performance, implementations of this method should not call Searchable.doc(int) or IndexReader.document(int) on every document number encountered" Have I misunderstood and doing this wrongly ? 2. Would your code be faster (under what circumstances) ? 3. One problem i see with my current solution is that it accesses every doc of the result set. One of the previous responses pointed to a solution in http://www.mail-archive.com/java-dev@lucene.apache.org/msg00034.html After reading it, to me it looked like that solution won't be any better. (Looks like it walks values of terms that do not even occur in teh current search result set). Have I got this right ? I am a newbee to lucene. Thanks for all the replies. Appreciate it very much. -Antony --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org