lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers" <a.schrijv...@hippo.nl>
Subject RE: How to show category count with results?
Date Tue, 31 Jul 2007 18:20:23 GMT
Hello Shailendra,

AFAICS you are reasoning from a static doc-id POV, while documents do not have a static doc-id
in lucene. When you have a frequently updated index, you'll end up invalidating cached BitSet's
(which as the number of categories and number of documents grow can absorb quite amounts of
memory as well), because merging of segments and lucene optimizations take place (shuffling
doc-id's around (actually compressing where 'holes' are from deleted documents)). So, obviously,
you need to minimize segment merging which you can control to some level, but after merging,
you frequently need to compute and cache your BitSet's again. For many categories and many
documents, this is not fast enough. If your index changes only a few times, you are fine (not
sure how solr handles this, but they ofcourse build facetted navigation in it). Otherwise,
you might try having some more or less "static" persistent index, and a volatile memory index
in which documents are added. When a doc is updated, you need to set the correct bits in the
cached BitSets of the persistent index to 0. Think it is not very easy, but might just work...

Regards Ard

> 
> A better way is following:
> Cache the list of doc-ids for each category - you can cache this in a
> BitSet.. a bit at index "doc-id" is on if the category is present in
> document "doc-id", else it is off.
> 
> For user query, you need to calculate the BitSet, similar to 
> above way. This
> can be done in a HitCollector implementation.
> 
> Then simply do the intersection of user query's BitSet and 
> each category
> BitSet - find count of "on" bits, this would give you count 
> of documents for
> each category.
> 
> The BitSet operations I talked above are already provided in 
> Java, so your
> piece of code would be really small.
> 
> Thanks,
> Shailendra Sharma
> CTO, Ver Se' Innovation Private Ltd.
> Bangalore, India
> 
> On 7/30/07, Dennis Kubes <kubes@apache.org> wrote:
> >
> > We found that a fast way to do this simply by running a 
> query for each
> > category and getting the maxDocs.  There would be one query 
> for category
> > getting a single hit.
> >
> > Dennis Kubes
> >
> > Erick Erickson wrote:
> > > You might want to search the mail archive for "facets" or "faceted
> > search"
> > > (no quotes), as I *think* this might be relevant.
> > >
> > > Best
> > > Erick
> > >
> > > On 7/26/07, Ramana Jelda <ramana.jelda@ciao-group.com> wrote:
> > >> Hi ,
> > >> Of course this statement is very expensive.
> > >> -->document.get("CAMPCATID")==null?"":document.get("CAMPCATID");
> > >>
> > >> Use StringIndex/FieldCache/something similar to 
> implement category
> > >> counting.
> > >> :)
> > >>
> > >> Jelda
> > >>
> > >>> -----Original Message-----
> > >>> From: Bhavin Pandya [mailto:bhavinp@rediff.co.in]
> > >>> Sent: Thursday, July 26, 2007 5:20 PM
> > >>> To: java-user@lucene.apache.org
> > >>> Subject: How to show category count with results?
> > >>>
> > >>> Hi,
> > >>>
> > >>> I want to show each category name and its count with results.
> > >>> I achieved this using DocCollector but its very slow when no
> > >>> of results in lacs... As fetching of documents from reader in
> > >>> collect method is expensive...
> > >>>
> > >>> public void collect(int doc, float score) {
> > >>>     Document document = mreader.document(doc);
> > >>>     strcatid =
> > >>> document.get("CAMPCATID")==null?"":document.get("CAMPCATID");
> > >>>
> > >>>     if (catcountmap.containsKey(strcatid))
> > >>>     {
> > >>>         // catid already exists in hashmap... increase 
> count by one
> > >>>
> > >>>         value = ((Integer)catcountmap.get(strcatid)).intValue();
> > >>>         value = value + 1;
> > >>>         catcountmap.put(strcatid,new Integer(value));
> > >>>     }
> > >>>     else
> > >>>         catcountmap.put(strcatid,new Integer(1));
> > >>>
> > >>> }
> > >>>
> > >>>
> > >>> is there any other better way to achieve this ????
> > >>>
> > >>>
> > >>> Thanks.
> > >>> Bhavin pandya
> > >>
> > >> 
> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>
> > >>
> > >
> >
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message