Return-Path: Delivered-To: apmail-lucene-mahout-user-archive@minotaur.apache.org Received: (qmail 55133 invoked from network); 6 Jul 2009 20:39:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 6 Jul 2009 20:39:52 -0000 Received: (qmail 6959 invoked by uid 500); 6 Jul 2009 20:40:02 -0000 Delivered-To: apmail-lucene-mahout-user-archive@lucene.apache.org Received: (qmail 6900 invoked by uid 500); 6 Jul 2009 20:40:02 -0000 Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-user@lucene.apache.org Delivered-To: mailing list mahout-user@lucene.apache.org Received: (qmail 6889 invoked by uid 99); 6 Jul 2009 20:40:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Jul 2009 20:40:02 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of nfantone@gmail.com designates 209.85.212.184 as permitted sender) Received: from [209.85.212.184] (HELO mail-vw0-f184.google.com) (209.85.212.184) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Jul 2009 20:39:53 +0000 Received: by vwj14 with SMTP id 14so3014909vwj.29 for ; Mon, 06 Jul 2009 13:39:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=ytyTjLhfSu3EmGQCxK4iC7ffckCLu73aYX27KUDyKxU=; b=m+cnkeq9TFneeeys5IiiMLcGNuiV5NRnQPXIHGSpIzL3U01KtEH1aZJwE60ZwB2g1D YNlnGi1QXXsD1aL9oTyn7XfrSzy5MgxY708wqErcUaFr7x5ik0x2xTKnwL5IvHbBC3ED hxg9LA+MO3j4jWNLjMFUhzHIrcvSRYRdUHwyg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=qT/JcAB5XWwBRJPykLX7P9MGoujM/u6LRFRL1hE8MI4aG9wTv9ucvsW/zZ7uMeg9Vm 3eGcKnDpr1T+aemCHxzQqN4xcr9nbc6Zu9EU8+fKNamW8Qkt1CsHwaTeUUsR8Uf1ljNM kJh/3cQ39chOx4WBuTWQvYyVUs7XVz3lLVzck= MIME-Version: 1.0 Received: by 10.220.72.79 with SMTP id l15mr10548174vcj.4.1246912771481; Mon, 06 Jul 2009 13:39:31 -0700 (PDT) In-Reply-To: <4A524889.2060102@windwardsolutions.com> References: <37ffc8080906260720w485c1babq9b0b765c07e9e0ac@mail.gmail.com> <37ffc8080906260921u7240f784g92f54fe4148c48c0@mail.gmail.com> <37ffc8080907010637v483ec7d6k8de9e746eda69dec@mail.gmail.com> <4B080410-D4B0-49A2-A73A-5A04B0E286A1@apache.org> <37ffc8080907020733m19eacd5fkb368dc44068da29a@mail.gmail.com> <4A4CD38A.5070409@windwardsolutions.com> <37ffc8080907021116p2cdf679do38c5760151275db6@mail.gmail.com> <37ffc8080907061131l75ff0958x1f3a6c65c26878e@mail.gmail.com> <4A524889.2060102@windwardsolutions.com> Date: Mon, 6 Jul 2009 17:39:31 -0300 Message-ID: <37ffc8080907061339v497fe562g484ebb577e992745@mail.gmail.com> Subject: Re: Clustering from DB From: nfantone To: mahout-user@lucene.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org I was using Canopy to create input clusters, but the error appeared while running kMeans (if I run kMeans' job only with previously created clusters from Canopy placed in output/canopies as initial clusters, it still fails). I noticed no other problems. I was using revision 790979 before updating. Strangely, there were no changes in the job and drivers class from that revision. svn diff shows that the only classes that changed in org.apache.mahout.clustering.kmeans package were KMeansInfo.java and RandomSeedGenerator.java On Mon, Jul 6, 2009 at 3:55 PM, Jeff Eastman wrote: > Hum, no, it's looking for the output of the first iteration. Were there > other errors? What was the last revision you were running? It does look like > something got horked, as it should be looking for output/clusters-0/*. Can > you diff the job and driver class to see what changed? > > Jeff > > nfantone wrote: >> >> Fellows, today I updated to revision 791558 and while running kMeans I >> got the following exception: >> >> WARNING: java.io.FileNotFoundException: File >> output/clusters-0/part-00000/* does not exist. >> java.io.FileNotFoundException: File output/clusters-0/part-00000/* >> does not exist. >> >> The algorithm isn't interrupted, though. But this exception wasn't >> thrown before the update and, to me, its message is not quite clear. >> It seems as it's looking for any file inside a "part-00000" directory, >> which doesn't exist; and, as far as I know, "part-xxxxx" are default >> names for output files. >> >> I could show the entire stack trace, if needed. Any pointers? >> >> >> On Thu, Jul 2, 2009 at 3:16 PM, nfantone wrote: >> >>> >>> Thanks for the feedback, Jeff. >>> >>> >>>> >>>> The logical format of input to KMeans is as it is in >>>> sequence >>>> file format, but the Key is never used. To my knowledge, there is no >>>> requirement to assign identifiers to the input points*. Users are free >>>> to >>>> associate an arbitrary name field with each vector - also label mappings >>>> may >>>> be assigned - but these are not manipulated by KMeans or any of the >>>> other >>>> clustering applications. The name field is now used as a vector >>>> identifier >>>> by the KMeansClusterMapper - if it is non-null - in the output step >>>> only. >>>> >>> >>> The key may not be used internally, but externally they can prove to >>> be pretty useful. For me, keys are userIDs and each Vector represents >>> his/her historical behavior. Being able to collect the output >>> information as is quite neat as it allows me to, >>> for instance, retrieve user information using data directly from a >>> HDFS file's field. >>> >>> >> >> >> > >