Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 10533 invoked from network); 16 May 2006 10:04:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 16 May 2006 10:04:57 -0000 Received: (qmail 41361 invoked by uid 500); 16 May 2006 10:04:52 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 40823 invoked by uid 500); 16 May 2006 10:04:48 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 40809 invoked by uid 99); 16 May 2006 10:04:48 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 May 2006 03:04:48 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of ramana.jelda@ciao-group.com designates 194.112.113.90 as permitted sender) Received: from [194.112.113.90] (HELO pigeon.ciao.com) (194.112.113.90) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 May 2006 03:04:47 -0700 Received: from Jelda (gateway.ciao.com [194.221.9.215]) by pigeon.ciao.com (8.13.3/8.13.3) with ESMTP id k4GA4OZU044340 for ; Tue, 16 May 2006 12:04:24 +0200 (CEST) (envelope-from ramana.jelda@ciao-group.com) Message-Id: <200605161004.k4GA4OZU044340@pigeon.ciao.com> From: "Ramana Jelda" To: Subject: RE: Aggregating category hits Date: Tue, 16 May 2006 12:04:19 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook, Build 11.0.6353 In-Reply-To: <4469A042.7070009@naukri.com> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869 Thread-Index: AcZ4zagRj70QmF23RXavzu6aQbJ5wQAASn2g X-Scanned-By: MIMEDefang 2.52 on 194.112.113.90 X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hi Kapil, As I remember FieldCache is in lucene api since 1.4 . Ok . Anyhow here is suedo code that can help. //1. initialize reader on opening documentId to the categoryid relation as below. Depending on your requirement you can either getStringIndex().. I get StringIndex in //my project. String[] docId2CategoryIdRelation=FieldCache.DEFAULT.getStrings(reader, categoryFieldName); //2. cache it //3. search as usal with your Query providing your own HitCollector //4. use docId2CategoryIdRelation to retrieve category id for each result document String yourCategoryId= docId2CategoryIdRelation[resultDocId] //5.Increment yourCategoryId count (do lazy initialization of categoryCounts holder.FAQ.) //6 You are done.. :) All the best, Jelda > -----Original Message----- > From: Kapil Chhabra [mailto:kapil.chhabra@naukri.com] > Sent: Tuesday, May 16, 2006 11:50 AM > To: java-user@lucene.apache.org > Subject: Re: Aggregating category hits > > Hi Jelda, > I have not yet migrated to Lucene 1.9 and I guess FieldCache > has been introduced in this release. > Can you please give me a pointer to your strategy of FieldCache? > > Thanks & Regards, > Kapil Chhabra > > > Ramana Jelda wrote: > > But this BitSet strategy is more memory consuming mainly if > you have > > documents in million numbers and categories in thousands. > > So I preferred in my project FieldCache strategy. > > > > Jelda > > > > > >> -----Original Message----- > >> From: Kapil Chhabra [mailto:kapil.chhabra@naukri.com] > >> Sent: Tuesday, May 16, 2006 7:38 AM > >> To: java-user@lucene.apache.org > >> Subject: Re: Aggregating category hits > >> > >> Even I am doing the same in my application. > >> Once in a day, all the filters [for different categories] are > >> initialized. Each time a query is fired, the Query BitSet is ANDed > >> with the BitSet of each filter. The cardinality obtained is the > >> desired output. > >> @Eric: I would like to know more about the implementation > with DocSet > >> in place of Bitset. > >> > >> Regards, > >> kapilChhabra > >> > >> > >> Erik Hatcher wrote: > >> > >>> On May 15, 2006, at 5:07 PM, Marvin Humphrey wrote: > >>> > >>>> If you needed to know not just the total number of hits, but the > >>>> number of hits in each "category", how would you handle that? > >>>> > >>>> For instance, a search for "egg" would have to produce > the 20 most > >>>> relevant documents for "egg", but also a list like this: > >>>> > >>>> Holiday & Seasonal / Easter 75 > >>>> Books / Cooking 52 > >>>> Miscellaneous 44 > >>>> Kitchen Collectibles 43 > >>>> Hobbies / Crafts 17 > >>>> [...] > >>>> > >>>> It seems to me that you'd have to retrieve each hit's > >>>> > >> stored fields > >> > >>>> and examine the contents of a "category" field. That's a lot of > >>>> overhead. Is there another way? > >>>> > >>> My first implementation of faceted browsing uses BitSet's > that get > >>> pre-loaded for each category value (each unique term in a > "category" > >>> field, for example). And to intersect that with an actual > Query, it > >>> gets run through the QueryFilter to get its BitSet and then AND'd > >>> together with each of the category BitSet's. Sounds like > a lot, but > >>> for my applications there are not tons of these BitSet's and the > >>> performance has been outstanding. Now that I'm doing more > >>> > >> with Solr, > >> > >>> I'm beginning to leverage its amazing caching infrastructure and > >>> replacing BitSet's with DocSet's. > >>> > >>> Erik > >>> > >>> > >>> > >>> > >> > --------------------------------------------------------------------- > >> > >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > >>> For additional commands, e-mail: java-user-help@lucene.apache.org > >>> > >>> > >>> > >> > --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > >> For additional commands, e-mail: java-user-help@lucene.apache.org > >> > >> > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org