lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Aggregating category hits
Date Mon, 15 May 2006 21:14:33 GMT
Marvin Humphrey wrote:
> Greets,
>
> If you needed to know not just the total number of hits, but the 
> number of hits in each "category", how would you handle that?
>
> For instance, a search for "egg" would have to produce the 20 most 
> relevant documents for "egg", but also a list like this:
>
>     Holiday & Seasonal / Easter     75
>     Books / Cooking                 52
>     Miscellaneous                   44
>     Kitchen Collectibles            43
>     Hobbies / Crafts                17
>     [...]
>
> It seems to me that you'd have to retrieve each hit's stored fields 
> and examine the contents of a "category" field.  That's a lot of 
> overhead.  Is there another way?

Statistical sampling of results and estimation, that's what I use ... 
for large result sets works reasonably well, for small result sets I 
just pay the penalty of retrieving all docs.

Also, take a look at the following: 
http://www2005.org/cdrom/docs/p245.pdf, "Sampling Search-Engine Results",
A. Anagnostopoulos, A. Z. Broder, D. Carmel .

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message