lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Returning a minimum number of clusters
Date Mon, 01 May 2006 16:03:43 GMT

I'm toying with the idea of implementing clustering of search results  
based on comparison of document vectors constrained by field.  For  
instance, you could cluster based on "topic", or "domain", or  
"content".  "domain" would be easy, as it's presumably a single value  
field.  "content" would be much more involved.

The problem I'm trying to solve is how to return a minimum number of  
clusters from a search.  Say the most relevant 100 documents for a  
query are all from the same domain, but you want a maximum of two  
results per domain, a la Google.  I don't see any alternative to  
rerunning the query an indeterminate number of times until you've  
accumulated sufficient clusters, because the search logic doesn't  
know what cluster a document belongs to until the document vector is  

Is there a better way?

Marvin Humphrey
Rectangular Research

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message