lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Facets with an IDF concept
Date Tue, 23 Jun 2009 22:23:11 GMT

: Regardless of the semantics, it doesn't sound like DF would give you what you
: want.  It could be entirely possible that in some short timespan the number of
: docs on Iran could match up w/ the number on Obama (maybe not for that
: particular example) in which case your "hot" item would no longer appear hot.

but if hte numbers match up in that timespan then the "hot" item isn't as 
"hot" anymore.

Myabe i'm missunderstanding: but it sounds like Asif's question esentailly 
boils down to getting facet constraints sorted after using some 
normalizing fraction ... the simplest case being the inverse ratio (this 
is where i think Asif is comparing it to IDF) of the number of matches for 
that facet in some larger docset to the size of the docset-- typically 
that docset could be the entire index, but it could also be the same 
search over a large window of time.

So if i was doing a news search for all docs in the last 24 hours, I could 
multiple each of those facet counts by the ratio of the corrisponding 
counts from the past month to the number of articles from the past monght 
see how much "hotter" they are in my smaller result set...

current result set facet counts (X)...
  News:1100
  Obama:1000
  Iran:800
  Miley Cyrus:700
  iPod:500

facet counts from the past month (Y), during which type 9000 (Z)
documents were published...
  News:9000
  Obama:7000
  Iran:1000
  Miley Cyrus:4000
  iPod:5000

X*(Z/Y)...
  Iran:7200
  Miley Cyrus:1575
  Obama:1285.7
  News:1100
  iPod:900
  

Doing this in a Solr plugin would be the best way to to this -- because 
otherwise your "hot" terms might not even show up in the facet lists.  
any attempt to do it on the client would just be an approximation, and 
could easily miss the "hottest" item if it was just below cutoff for hte 
number of constraints to be returned.


-Hoss


Mime
View raw message