lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramana Jelda" <ramana.je...@ciao-group.com>
Subject RE: Aggregating category hits
Date Tue, 16 May 2006 10:04:19 GMT
Hi Kapil,
As I remember FieldCache is in lucene api since 1.4 .
Ok . Anyhow here is suedo code that can help.

//1. initialize reader on opening documentId to the categoryid relation as
below. Depending on your requirement you can either getStringIndex().. I get
StringIndex in //my project.

String[] docId2CategoryIdRelation=FieldCache.DEFAULT.getStrings(reader,
categoryFieldName);

//2. cache it
//3. search as usal with your Query providing your own HitCollector
//4. use docId2CategoryIdRelation to retrieve category id for each result
document
String yourCategoryId=	docId2CategoryIdRelation[resultDocId]
//5.Increment yourCategoryId count (do lazy initialization of categoryCounts
holder.FAQ.)

//6 You are done.. :)

All the best,
Jelda




> -----Original Message-----
> From: Kapil Chhabra [mailto:kapil.chhabra@naukri.com] 
> Sent: Tuesday, May 16, 2006 11:50 AM
> To: java-user@lucene.apache.org
> Subject: Re: Aggregating category hits
> 
> Hi Jelda,
> I have not yet migrated to Lucene 1.9 and I guess FieldCache 
> has been introduced in this release.
> Can you please give me a pointer to your strategy of FieldCache?
> 
> Thanks & Regards,
> Kapil Chhabra
> 
> 
> Ramana Jelda wrote:
> > But this BitSet strategy is more memory consuming mainly if 
> you have 
> > documents in million numbers and categories in thousands.
> > So I preferred in my project FieldCache strategy.
> >
> > Jelda
> >
> >   
> >> -----Original Message-----
> >> From: Kapil Chhabra [mailto:kapil.chhabra@naukri.com]
> >> Sent: Tuesday, May 16, 2006 7:38 AM
> >> To: java-user@lucene.apache.org
> >> Subject: Re: Aggregating category hits
> >>
> >> Even I am doing the same in my application.
> >> Once in a day, all the filters [for different categories] are 
> >> initialized. Each time a query is fired, the Query BitSet is ANDed 
> >> with the BitSet of each filter. The cardinality obtained is the 
> >> desired output.
> >> @Eric: I would like to know more about the implementation 
> with DocSet 
> >> in place of Bitset.
> >>
> >> Regards,
> >> kapilChhabra
> >>
> >>
> >> Erik Hatcher wrote:
> >>     
> >>> On May 15, 2006, at 5:07 PM, Marvin Humphrey wrote:
> >>>       
> >>>> If you needed to know not just the total number of hits, but the 
> >>>> number of hits in each "category", how would you handle that?
> >>>>
> >>>> For instance, a search for "egg" would have to produce 
> the 20 most 
> >>>> relevant documents for "egg", but also a list like this:
> >>>>
> >>>> Holiday & Seasonal / Easter 75
> >>>> Books / Cooking 52
> >>>> Miscellaneous 44
> >>>> Kitchen Collectibles 43
> >>>> Hobbies / Crafts 17
> >>>> [...]
> >>>>
> >>>> It seems to me that you'd have to retrieve each hit's
> >>>>         
> >> stored fields
> >>     
> >>>> and examine the contents of a "category" field. That's a lot of 
> >>>> overhead. Is there another way?
> >>>>         
> >>> My first implementation of faceted browsing uses BitSet's 
> that get 
> >>> pre-loaded for each category value (each unique term in a 
> "category"
> >>> field, for example). And to intersect that with an actual 
> Query, it 
> >>> gets run through the QueryFilter to get its BitSet and then AND'd 
> >>> together with each of the category BitSet's. Sounds like 
> a lot, but 
> >>> for my applications there are not tons of these BitSet's and the 
> >>> performance has been outstanding. Now that I'm doing more
> >>>       
> >> with Solr,
> >>     
> >>> I'm beginning to leverage its amazing caching infrastructure and 
> >>> replacing BitSet's with DocSet's.
> >>>
> >>> Erik
> >>>
> >>>
> >>>
> >>>       
> >> 
> ---------------------------------------------------------------------
> >>     
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>>
> >>>       
> >> 
> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>     
> >
> >
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >   
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message