Return-Path: Delivered-To: apmail-lucene-mahout-dev-archive@minotaur.apache.org Received: (qmail 63238 invoked from network); 26 Feb 2010 04:54:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 26 Feb 2010 04:54:02 -0000 Received: (qmail 34478 invoked by uid 500); 26 Feb 2010 04:54:02 -0000 Delivered-To: apmail-lucene-mahout-dev-archive@lucene.apache.org Received: (qmail 34455 invoked by uid 500); 26 Feb 2010 04:54:02 -0000 Mailing-List: contact mahout-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-dev@lucene.apache.org Delivered-To: mailing list mahout-dev@lucene.apache.org Received: (qmail 34447 invoked by uid 99); 26 Feb 2010 04:54:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Feb 2010 04:54:02 +0000 X-ASF-Spam-Status: No, hits=3.5 required=10.0 tests=SPF_PASS,URIBL_BLACK X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of drew.farris@gmail.com designates 209.85.223.189 as permitted sender) Received: from [209.85.223.189] (HELO mail-iw0-f189.google.com) (209.85.223.189) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Feb 2010 04:53:51 +0000 Received: by iwn27 with SMTP id 27so5836311iwn.5 for ; Thu, 25 Feb 2010 20:53:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:message-id:subject :from:to:content-type; bh=cD7UyArEpXksYDUP3jpwulDdBU5YjEXHzN5H192YQn8=; b=mqB7S+0EWXnb7eJudaUKIxW+2UQi0Vz03CiNWLd2Pkx9PN3RZhWhK/ZXbjdN7VlFyi dyNkaoq/vsrIYgUwwID9xsKc4X7CN0yk+zxIE8zS1aRM3LlFftH8xy+oEI8CUTVGApve avAplv9ZUcozHdQcONftO9lc6d78h1aWdqo5o= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=x8st0BZ1krpn2WQriefEIFg2oMkVB4tIdUkZKRyOjJJ2ihaRiz/SoAei4zQbmay+g9 ashEUUTOoPQ1sanaZ6vysSDAdb+/T0g97kbP0vgvaGDSRG7/b1bWzyRixaCt/SuQEf4M ZglDBxDKfQIPZ/zRTDV/C13/mB91lqUgBi/nM= MIME-Version: 1.0 Received: by 10.231.183.133 with SMTP id cg5mr807833ibb.12.1267160008118; Thu, 25 Feb 2010 20:53:28 -0800 (PST) Date: Thu, 25 Feb 2010 23:53:27 -0500 Message-ID: <8f8e14c41002252053v51b7ed16ra0d41a647d74ba2@mail.gmail.com> Subject: Proper way to dump kmeans clusters? From: Drew Farris To: mahout-dev@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 I'm trying to dump the clusters generated using kmeans -- I am running on the 20-news data prepped by SequenceFileFromDirectory and SparseVectorsFromSequenceFiles. I'm running with the 301 patch in place, the files are on hdfs and the necessary hadoop env vars are set for the mahout script. ./mahout clusterdump -s mahout/20news-sv/kmeans/clusters-10 -o mahout/20news-sv/kmeans-dump -p mahout/20news-sv/kmeans/points -d mahout/20news-sv/dictionary.file-0 -dt sequencefile I get the error: java.lang.NullPointerException at org.apache.mahout.utils.clustering.ClusterDumper.readPoints(ClusterDumper.java:323) at org.apache.mahout.utils.clustering.ClusterDumper.init(ClusterDumper.java:85) at org.apache.mahout.utils.clustering.ClusterDumper.(ClusterDumper.java:78) It seems to work fine if I copy the files from hdts to my local filesystem. I suspect that this is due to the fact the ClusterDumper uses java.io filesystem primitives to locate the points file instead of the Hadoop primitives. (lines 316-321) Also, If I run the entire job locally, SparseVectorsFromSequenceFiles generates multiple dictionries: dictionary.file-0 and dictionary.file-1 -- how would I use these as input to the dumper? Thanks, Drew