lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Returning a minimum number of clusters
Date Mon, 01 May 2006 17:21:37 GMT
You might be interested in the Carrot project, which has some Lucene 
support.  I don't know if it solves your second problem, but it already 
implements clustering and may allow you to get to an answer for the 
second problem quicker.  I have, just recently, started using it for a 
clustering task I am working on related to search results.  I think the 
author of Carrot is on the user list from time to time

Marvin Humphrey wrote:
> Greets,
> I'm toying with the idea of implementing clustering of search results 
> based on comparison of document vectors constrained by field.  For 
> instance, you could cluster based on "topic", or "domain", or 
> "content".  "domain" would be easy, as it's presumably a single value 
> field.  "content" would be much more involved.
> The problem I'm trying to solve is how to return a minimum number of 
> clusters from a search.  Say the most relevant 100 documents for a 
> query are all from the same domain, but you want a maximum of two 
> results per domain, a la Google.  I don't see any alternative to 
> rerunning the query an indeterminate number of times until you've 
> accumulated sufficient clusters, because the search logic doesn't know 
> what cluster a document belongs to until the document vector is 
> retrieved.
> Is there a better way?
> Marvin Humphrey
> Rectangular Research
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:


Grant Ingersoll 
Sr. Software Engineer 
Center for Natural Language Processing 
Syracuse University 
School of Information Studies 
335 Hinds Hall 
Syracuse, NY 13244 
Voice:  315-443-5484 
Fax: 315-443-6886 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message