lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Aggregating category hits
Date Tue, 16 May 2006 09:33:12 GMT

On May 16, 2006, at 1:37 AM, Kapil Chhabra wrote:
> Even I am doing the same in my application.
> Once in a day, all the filters [for different categories] are  
> initialized. Each time a query is fired, the Query BitSet is ANDed  
> with the BitSet of each filter. The cardinality obtained is the  
> desired output.
> @Eric: I would like to know more about the implementation with  
> DocSet in place of Bitset.

I'm still in progress with migration to Solr's infrastructure.   
Regarding the other issue Jelda brought up about memory - another  
handy thing about Solr is that it caches things into a configurable  
LRU cache which allows memory usage to be controlled at the expense  
of performance with cache misses.

	Erik


>
> Regards,
> kapilChhabra
>
>
> Erik Hatcher wrote:
>>
>> On May 15, 2006, at 5:07 PM, Marvin Humphrey wrote:
>>> If you needed to know not just the total number of hits, but the  
>>> number of hits in each "category", how would you handle that?
>>>
>>> For instance, a search for "egg" would have to produce the 20  
>>> most relevant documents for "egg", but also a list like this:
>>>
>>> Holiday & Seasonal / Easter 75
>>> Books / Cooking 52
>>> Miscellaneous 44
>>> Kitchen Collectibles 43
>>> Hobbies / Crafts 17
>>> [...]
>>>
>>> It seems to me that you'd have to retrieve each hit's stored  
>>> fields and examine the contents of a "category" field. That's a  
>>> lot of overhead. Is there another way?
>>
>> My first implementation of faceted browsing uses BitSet's that get  
>> pre-loaded for each category value (each unique term in a  
>> "category" field, for example). And to intersect that with an  
>> actual Query, it gets run through the QueryFilter to get its  
>> BitSet and then AND'd together with each of the category BitSet's.  
>> Sounds like a lot, but for my applications there are not tons of  
>> these BitSet's and the performance has been outstanding. Now that  
>> I'm doing more with Solr, I'm beginning to leverage its amazing  
>> caching infrastructure and replacing BitSet's with DocSet's.
>>
>> Erik
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message