lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shailendra Sharma" <shailendra.sha...@gmail.com>
Subject Re: How to show category count with results?
Date Tue, 31 Jul 2007 17:54:02 GMT
A better way is following:
Cache the list of doc-ids for each category - you can cache this in a
BitSet.. a bit at index "doc-id" is on if the category is present in
document "doc-id", else it is off.

For user query, you need to calculate the BitSet, similar to above way. This
can be done in a HitCollector implementation.

Then simply do the intersection of user query's BitSet and each category
BitSet - find count of "on" bits, this would give you count of documents for
each category.

The BitSet operations I talked above are already provided in Java, so your
piece of code would be really small.

Thanks,
Shailendra Sharma
CTO, Ver Se' Innovation Private Ltd.
Bangalore, India

On 7/30/07, Dennis Kubes <kubes@apache.org> wrote:
>
> We found that a fast way to do this simply by running a query for each
> category and getting the maxDocs.  There would be one query for category
> getting a single hit.
>
> Dennis Kubes
>
> Erick Erickson wrote:
> > You might want to search the mail archive for "facets" or "faceted
> search"
> > (no quotes), as I *think* this might be relevant.
> >
> > Best
> > Erick
> >
> > On 7/26/07, Ramana Jelda <ramana.jelda@ciao-group.com> wrote:
> >> Hi ,
> >> Of course this statement is very expensive.
> >> -->document.get("CAMPCATID")==null?"":document.get("CAMPCATID");
> >>
> >> Use StringIndex/FieldCache/something similar to implement category
> >> counting.
> >> :)
> >>
> >> Jelda
> >>
> >>> -----Original Message-----
> >>> From: Bhavin Pandya [mailto:bhavinp@rediff.co.in]
> >>> Sent: Thursday, July 26, 2007 5:20 PM
> >>> To: java-user@lucene.apache.org
> >>> Subject: How to show category count with results?
> >>>
> >>> Hi,
> >>>
> >>> I want to show each category name and its count with results.
> >>> I achieved this using DocCollector but its very slow when no
> >>> of results in lacs... As fetching of documents from reader in
> >>> collect method is expensive...
> >>>
> >>> public void collect(int doc, float score) {
> >>>     Document document = mreader.document(doc);
> >>>     strcatid =
> >>> document.get("CAMPCATID")==null?"":document.get("CAMPCATID");
> >>>
> >>>     if (catcountmap.containsKey(strcatid))
> >>>     {
> >>>         // catid already exists in hashmap... increase count by one
> >>>
> >>>         value = ((Integer)catcountmap.get(strcatid)).intValue();
> >>>         value = value + 1;
> >>>         catcountmap.put(strcatid,new Integer(value));
> >>>     }
> >>>     else
> >>>         catcountmap.put(strcatid,new Integer(1));
> >>>
> >>> }
> >>>
> >>>
> >>> is there any other better way to achieve this ????
> >>>
> >>>
> >>> Thanks.
> >>> Bhavin pandya
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message