mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Saile <>
Subject Re: AW: Incremental clustering
Date Thu, 12 May 2011 08:53:39 GMT
I am still stuck at this problem.

Can anyone give me a heads-up on how existing systems handle this? 
If a collection of documents is modified, is the clustering recomputed from scratch each time?

Or is there in fact any incremental way to handle an evolving set of documents?

I would really appreciate any hint!


Am 09.05.2011 um 12:45 schrieb Ulrich Poppendieck:

> Not an answer, but a follow-up question: 
> I would be interested in the very same thing, but with the possibility to assign new
sites to existing clusters OR to new ones.
> Thanks in advance,
> Ulrich
> -----Urspr√ľngliche Nachricht-----
> Von: David Saile [] 
> Gesendet: Montag, 9. Mai 2011 11:53
> An:
> Betreff: Incremental clustering
> Hi list,
> I am completely new to Mahout, so please forgive me if the answer to my question is too
> For a case study, I am working on a simple incremental web crawler (much like Nutch)
and I want to include a very simple indexing step that incorporates clustering of documents.
> I was hoping to use some kind of incremental clustering algorithm, in order to make use
of the incremental way the crawler is supposed to work (i.e. continuously adding and updating
> Is there some way to achieve the following: 	
> 	1) initial clustering of the first web-crawl
> 	2) assigning new sites to existing clusters
> 	3) possibly moving modified sites between clusters
> I would really appreciate any help!
> Thanks,
> David

View raw message