Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4D7455650 for ; Thu, 12 May 2011 08:54:11 +0000 (UTC) Received: (qmail 80353 invoked by uid 500); 12 May 2011 08:54:10 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 80233 invoked by uid 500); 12 May 2011 08:54:09 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 80225 invoked by uid 99); 12 May 2011 08:54:09 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 May 2011 08:54:09 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [141.26.64.15] (HELO deliver.uni-koblenz.de) (141.26.64.15) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 May 2011 08:54:00 +0000 X-CHKRCPT: Envelopesender vrfy david@uni-koblenz.de Received: from [192.168.1.100] (trir-4d0d9f9f.pool.mediaWays.net [77.13.159.159]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by deliver.uni-koblenz.de (Postfix) with ESMTP id 32E97782C1C1 for ; Thu, 12 May 2011 10:53:40 +0200 (CEST) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: AW: Incremental clustering From: David Saile In-Reply-To: <7448A03F137F364F84197FC6FA27CFB17097B3D9@VICO-SBS01.vico.local> Date: Thu, 12 May 2011 10:53:39 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <8854AF18-E59C-4FA0-8F8D-925E9BF4BD13@uni-koblenz.de> References: <6E645A37-B7E4-4556-B9C5-1754DFD8E32C@uni-koblenz.de> <7448A03F137F364F84197FC6FA27CFB17097B3D9@VICO-SBS01.vico.local> To: user@mahout.apache.org X-Mailer: Apple Mail (2.1084) X-Virus-Checked: Checked by ClamAV on apache.org I am still stuck at this problem. Can anyone give me a heads-up on how existing systems handle this?=20 If a collection of documents is modified, is the clustering recomputed = from scratch each time?=20 Or is there in fact any incremental way to handle an evolving set of = documents? I would really appreciate any hint! Thanks, David Am 09.05.2011 um 12:45 schrieb Ulrich Poppendieck: > Not an answer, but a follow-up question:=20 > I would be interested in the very same thing, but with the possibility = to assign new sites to existing clusters OR to new ones. >=20 > Thanks in advance, > Ulrich >=20 > -----Urspr=FCngliche Nachricht----- > Von: David Saile [mailto:david@uni-koblenz.de]=20 > Gesendet: Montag, 9. Mai 2011 11:53 > An: user@mahout.apache.org > Betreff: Incremental clustering >=20 > Hi list, >=20 > I am completely new to Mahout, so please forgive me if the answer to = my question is too obvious. >=20 > For a case study, I am working on a simple incremental web crawler = (much like Nutch) and I want to include a very simple indexing step that = incorporates clustering of documents. >=20 > I was hoping to use some kind of incremental clustering algorithm, in = order to make use of the incremental way the crawler is supposed to work = (i.e. continuously adding and updating websites). >=20 > Is there some way to achieve the following: =09 > 1) initial clustering of the first web-crawl > 2) assigning new sites to existing clusters > 3) possibly moving modified sites between clusters >=20 > I would really appreciate any help! >=20 > Thanks, > David