lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Burton-West, Tom" <tburt...@umich.edu>
Subject Getting facet counts for 10,000 most relevant hits
Date Sat, 24 Sep 2011 00:59:08 GMT
If relevance ranking is working well, in theory it doesn't matter how many hits you get as
long as the best results show up in the first page of results.  However, the default in choosing
which facet values to show is to show the facets with the highest count in the entire result
set.  Is there a way to issue some kind of a filter query or facet query that would show only
the facet counts for the 10,000 most relevant search results?

As an example, if you search in our full-text collection for "jaguar" you get 170,000 hits.
 If I am looking for the car rather than the OS or the animal, I might expect to be able to
click on a facet and limit my results to the car.  However, facets containing the word car
or automobile are not in the top 5 facets that we show.  If you click on "more"  you will
see "automobile periodicals" but not the rest of the facets containing the word automobile
.  This occurs because the facet counts are for all 170,000 hits.  The facet counts  for at
least 160,000 irrelevant hits are included (assuming only the top 10,000 hits are relevant)
.

What we would like to do is get the facet counts for the N most relevant documents and select
the 5 or 30 facet values with the highest counts for those relevant documents.

Is this possible or would it require writing some lucene or Solr code?

Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message