I am assuming that you *didn't* convert the 20newsgroups into the required format which resulted in this error. Is my guess right? Robin On Wed, Apr 7, 2010 at 3:29 AM, Grant Ingersoll wrote: > What are the commands you are running? > > On Apr 5, 2010, at 9:59 AM, Adam Hammer wrote: > > > Hello all, > > > > I am just starting out with Mahout, and to get my feet wet I am running > > through the TwentyNewsGroups example. I have successfully configured a > > single node Hadoop system as well as a pseudo-distributed Hadoop system > on > > two separate machines. On both environments, I have gone through the > guide > > successfully to put all the news inputs into the folder 20news-input. I > am > > able to successfully ls and cat the files in the directory. > > > > However, when I go to run the TrainClassifier, I am getting the following > > message: > > > > 10/04/05 09:48:33 INFO bayes.TrainClassifier: Training Complementary > Bayes > > Classifier > > 10/04/05 09:48:33 INFO cbayes.CBayesDriver: Reading features... > > 10/04/05 09:48:33 WARN mapred.JobClient: Use GenericOptionsParser for > > parsing the arguments. Applications should implement Tool for the same. > > 10/04/05 09:48:33 INFO mapred.FileInputFormat: Total input paths to > process > > : 19 > > Exception in thread "main" java.io.IOException: Not a file: > > hdfs://localhost:9000/user/bob/20news-input/comp.graphics > > at > > > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:206) > > at > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) > > at > > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) > > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) > > at > > > org.apache.mahout.classifier.bayes.mapreduce.common.BayesFeatureDriver.runJob(BayesFeatureDriver.java:75) > > at > > > org.apache.mahout.classifier.bayes.mapreduce.cbayes.CBayesDriver.runJob(CBayesDriver.java:61) > > at > > > org.apache.mahout.classifier.bayes.TrainClassifier.trainCNaiveBayes(TrainClassifier.java:56) > > at > > > org.apache.mahout.classifier.bayes.TrainClassifier.main(TrainClassifier.java:128) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > > I get this error on both the single node system I have setup, as well as > the > > separate dual-node system. As I said before, I am able to cat and ls > that > > directory and the files in it perfectly fine. Any thoughts? > > > > Thanks! > >