mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: About the Bayes TrainerDriver
Date Tue, 06 May 2008 10:39:01 GMT

On May 5, 2008, at 9:41 PM, phonechen wrote:

> hi,all:
> I'm using the mahout bayes classifier these days,and here are some  
> question
> about it:
> 1.SequenceFileModelReader only assume that the reduce task number is  
> 1,and
> if user's hadoop-site.xml set the default reduce task number to some  
> number
> larger than 1,it may get more than 1 reduce results,that is part-00000
> part-00001  ... ,some this may lead to some problems.
> I think change the path parameter to a directory the contains the  
> reduce
> file rather than directly the reduce file is more proper
> *public Model loadModel(FileSystem fs,* *Path path, Configuration  
> conf)
> throws IOException *
> ar set the num of reduce tasks to 1 in the TrainerDrive#runJob()
> *conf.setNumReduceTasks(1);*

Good catch.  We definitely want more than one reduce task.  I will fix  
that and put in tests for multiple reduces.

> **
> 2.why does the ClassifierDriver class load model data from the HDFS  
> instead
> of local filesystem?these can avoid copyToLocal command

but don't you just have to do a copyToLocal to get it on the local  
filesystem via bin/hadoop dfs -copyToLocal?  I must admit, it has been  
a while since I have done Hadoop stuff, especially the administrative  
stuff (1+ year).  I guess I was thinking you could load in non- 
distributed mode using the LocalFileSystem.  The code I have is:

       Configuration conf = new JobConf();
       FileSystem raw = new RawLocalFileSystem();
       FileSystem fs = new LocalFileSystem(raw);

       Path path = new Path(cmdLine.getOptionValue(pathOpt.getOpt()));
       System.out.println("Loading model from: " + path);
       Model model = reader.loadModel(fs, path, conf);

Thanks for the feedback,

View raw message