Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 69757 invoked from network); 31 Jul 2007 18:21:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 31 Jul 2007 18:21:15 -0000 Received: (qmail 70429 invoked by uid 500); 31 Jul 2007 18:21:08 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 70396 invoked by uid 500); 31 Jul 2007 18:21:08 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 70385 invoked by uid 99); 31 Jul 2007 18:21:08 -0000 Received: from Unknown (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Jul 2007 11:21:08 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [213.133.33.40] (HELO smtp.is.nl) (213.133.33.40) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Jul 2007 18:21:00 +0000 Received: from [213.133.51.241] (HELO hai01.hippo.local) by smtp.is.nl (CommuniGate Pro SMTP 5.0.10) with ESMTP id 20516680 for java-user@lucene.apache.org; Tue, 31 Jul 2007 20:20:37 +0200 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.0.6619.12 Subject: RE: How to show category count with results? Date: Tue, 31 Jul 2007 20:20:23 +0200 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: How to show category count with results? Thread-Index: AcfTm9ucDiy9x+TCSTmVM+davgyt4AAAiJuQ From: "Ard Schrijvers" To: X-Virus-Checked: Checked by ClamAV on apache.org Hello Shailendra, AFAICS you are reasoning from a static doc-id POV, while documents do = not have a static doc-id in lucene. When you have a frequently updated = index, you'll end up invalidating cached BitSet's (which as the number = of categories and number of documents grow can absorb quite amounts of = memory as well), because merging of segments and lucene optimizations = take place (shuffling doc-id's around (actually compressing where = 'holes' are from deleted documents)). So, obviously, you need to = minimize segment merging which you can control to some level, but after = merging, you frequently need to compute and cache your BitSet's again. = For many categories and many documents, this is not fast enough. If your = index changes only a few times, you are fine (not sure how solr handles = this, but they ofcourse build facetted navigation in it). Otherwise, you = might try having some more or less "static" persistent index, and a = volatile memory index in which documents are added. When a doc is = updated, you need to set the correct bits in the cached BitSets of the = persistent index to 0. Think it is not very easy, but might just work... Regards Ard >=20 > A better way is following: > Cache the list of doc-ids for each category - you can cache this in a > BitSet.. a bit at index "doc-id" is on if the category is present in > document "doc-id", else it is off. >=20 > For user query, you need to calculate the BitSet, similar to=20 > above way. This > can be done in a HitCollector implementation. >=20 > Then simply do the intersection of user query's BitSet and=20 > each category > BitSet - find count of "on" bits, this would give you count=20 > of documents for > each category. >=20 > The BitSet operations I talked above are already provided in=20 > Java, so your > piece of code would be really small. >=20 > Thanks, > Shailendra Sharma > CTO, Ver Se' Innovation Private Ltd. > Bangalore, India >=20 > On 7/30/07, Dennis Kubes wrote: > > > > We found that a fast way to do this simply by running a=20 > query for each > > category and getting the maxDocs. There would be one query=20 > for category > > getting a single hit. > > > > Dennis Kubes > > > > Erick Erickson wrote: > > > You might want to search the mail archive for "facets" or "faceted > > search" > > > (no quotes), as I *think* this might be relevant. > > > > > > Best > > > Erick > > > > > > On 7/26/07, Ramana Jelda wrote: > > >> Hi , > > >> Of course this statement is very expensive. > > >> = -->document.get("CAMPCATID")=3D=3Dnull?"":document.get("CAMPCATID"); > > >> > > >> Use StringIndex/FieldCache/something similar to=20 > implement category > > >> counting. > > >> :) > > >> > > >> Jelda > > >> > > >>> -----Original Message----- > > >>> From: Bhavin Pandya [mailto:bhavinp@rediff.co.in] > > >>> Sent: Thursday, July 26, 2007 5:20 PM > > >>> To: java-user@lucene.apache.org > > >>> Subject: How to show category count with results? > > >>> > > >>> Hi, > > >>> > > >>> I want to show each category name and its count with results. > > >>> I achieved this using DocCollector but its very slow when no > > >>> of results in lacs... As fetching of documents from reader in > > >>> collect method is expensive... > > >>> > > >>> public void collect(int doc, float score) { > > >>> Document document =3D mreader.document(doc); > > >>> strcatid =3D > > >>> = document.get("CAMPCATID")=3D=3Dnull?"":document.get("CAMPCATID"); > > >>> > > >>> if (catcountmap.containsKey(strcatid)) > > >>> { > > >>> // catid already exists in hashmap... increase=20 > count by one > > >>> > > >>> value =3D = ((Integer)catcountmap.get(strcatid)).intValue(); > > >>> value =3D value + 1; > > >>> catcountmap.put(strcatid,new Integer(value)); > > >>> } > > >>> else > > >>> catcountmap.put(strcatid,new Integer(1)); > > >>> > > >>> } > > >>> > > >>> > > >>> is there any other better way to achieve this ???? > > >>> > > >>> > > >>> Thanks. > > >>> Bhavin pandya > > >> > > >>=20 > --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > >> For additional commands, e-mail: java-user-help@lucene.apache.org > > >> > > >> > > > > > > >=20 > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > >=20 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org