Return-Path: Delivered-To: apmail-lucene-mahout-user-archive@minotaur.apache.org Received: (qmail 63861 invoked from network); 7 Apr 2010 00:06:59 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 7 Apr 2010 00:06:59 -0000 Received: (qmail 74461 invoked by uid 500); 7 Apr 2010 00:06:58 -0000 Delivered-To: apmail-lucene-mahout-user-archive@lucene.apache.org Received: (qmail 74409 invoked by uid 500); 7 Apr 2010 00:06:58 -0000 Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-user@lucene.apache.org Delivered-To: mailing list mahout-user@lucene.apache.org Received: (qmail 74401 invoked by uid 99); 7 Apr 2010 00:06:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Apr 2010 00:06:58 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of robin.anil@gmail.com designates 74.125.83.176 as permitted sender) Received: from [74.125.83.176] (HELO mail-pv0-f176.google.com) (74.125.83.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Apr 2010 00:06:52 +0000 Received: by pvg7 with SMTP id 7so333642pvg.35 for ; Tue, 06 Apr 2010 17:06:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:received:message-id:subject:to:content-type; bh=/o7cU6t4glf7Xygcxhd9y1pddbw2w7vdmUFsXhIsek4=; b=MgzOOXsvqH2Id8tIAfCR4xG0Zd9xSLoyRD4JRtVFWk2g4KDsJSe89g9Sa/ZvpI7Tdn LrOFtFyz1ORn8y+S3E5ViWbiN7XHwK5lSabTyh7FhgAcTwQ6s2WWYW+ntBZAyOkenVmK 5X7LwfHI/klg5UzhgZUt+wyhWmuNmrJwmcd6o= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=COMBhs0wOuJxRg93SXGgKi15g0yKO3mEgnRkSM0Y9u1H7/kxJYgb5FYJNxNzjUV9Zg HkTAkwrp/UYztC92npctoK92dgIH9iFC8f6RsLkWqLOvDFR5sccUDH4p8is9KNyBWYj3 JILoKHGiBMydcASMDiUvc4nq0o43gRw4kdsTs= MIME-Version: 1.0 Received: by 10.140.204.13 with HTTP; Tue, 6 Apr 2010 17:06:12 -0700 (PDT) In-Reply-To: <142D045C-FFF0-49F6-9AFE-B4E468023649@apache.org> References: <142D045C-FFF0-49F6-9AFE-B4E468023649@apache.org> From: Robin Anil Date: Wed, 7 Apr 2010 05:36:12 +0530 Received: by 10.141.12.10 with SMTP id p10mr6206888rvi.158.1270598792213; Tue, 06 Apr 2010 17:06:32 -0700 (PDT) Message-ID: Subject: Re: "Not a file" issue with TwentyNewsGroups To: mahout-user@lucene.apache.org Content-Type: multipart/alternative; boundary=000e0cd10612a9748704839a551e --000e0cd10612a9748704839a551e Content-Type: text/plain; charset=UTF-8 I am assuming that you *didn't* convert the 20newsgroups into the required format which resulted in this error. Is my guess right? Robin On Wed, Apr 7, 2010 at 3:29 AM, Grant Ingersoll wrote: > What are the commands you are running? > > On Apr 5, 2010, at 9:59 AM, Adam Hammer wrote: > > > Hello all, > > > > I am just starting out with Mahout, and to get my feet wet I am running > > through the TwentyNewsGroups example. I have successfully configured a > > single node Hadoop system as well as a pseudo-distributed Hadoop system > on > > two separate machines. On both environments, I have gone through the > guide > > successfully to put all the news inputs into the folder 20news-input. I > am > > able to successfully ls and cat the files in the directory. > > > > However, when I go to run the TrainClassifier, I am getting the following > > message: > > > > 10/04/05 09:48:33 INFO bayes.TrainClassifier: Training Complementary > Bayes > > Classifier > > 10/04/05 09:48:33 INFO cbayes.CBayesDriver: Reading features... > > 10/04/05 09:48:33 WARN mapred.JobClient: Use GenericOptionsParser for > > parsing the arguments. Applications should implement Tool for the same. > > 10/04/05 09:48:33 INFO mapred.FileInputFormat: Total input paths to > process > > : 19 > > Exception in thread "main" java.io.IOException: Not a file: > > hdfs://localhost:9000/user/bob/20news-input/comp.graphics > > at > > > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:206) > > at > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) > > at > > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) > > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) > > at > > > org.apache.mahout.classifier.bayes.mapreduce.common.BayesFeatureDriver.runJob(BayesFeatureDriver.java:75) > > at > > > org.apache.mahout.classifier.bayes.mapreduce.cbayes.CBayesDriver.runJob(CBayesDriver.java:61) > > at > > > org.apache.mahout.classifier.bayes.TrainClassifier.trainCNaiveBayes(TrainClassifier.java:56) > > at > > > org.apache.mahout.classifier.bayes.TrainClassifier.main(TrainClassifier.java:128) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > > I get this error on both the single node system I have setup, as well as > the > > separate dual-node system. As I said before, I am able to cat and ls > that > > directory and the files in it perfectly fine. Any thoughts? > > > > Thanks! > > --000e0cd10612a9748704839a551e--