mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From phonechen <phonec...@gmail.com>
Subject Re: About the Bayes TrainerDriver
Date Tue, 06 May 2008 12:04:43 GMT
sorry , I make a mistake,
what I means is that ,shall we put  the doc to be classified to HDFS and
leave the Model files on the HDFS and
make the whole classify process run on the HDFS,
so what to change is :
=====================
   Configuration conf = new JobConf();
 FileSystem raw = new RawLocalFileSystem();
  raw.setConf(conf);
  FileSystem fs = new LocalFileSystem(raw);
==================
to
========================
   Configuration conf = new JobConf();
   FileSystem fs = new DistributedFileSystem();
=======================

so we can classify a batch of inputs using mapreduce instead of run multiple
process of ClassifierDriver ,
correct me if there are something wrong.
Ps:Can we make the classify process parallel?





On 5/6/08, Grant Ingersoll <gsingers@apache.org> wrote:
>
>
> On May 5, 2008, at 9:41 PM, phonechen wrote:
>
> hi,all:
> > I'm using the mahout bayes classifier these days,and here are some
> > question
> > about it:
> > 1.SequenceFileModelReader only assume that the reduce task number is
> > 1,and
> > if user's hadoop-site.xml set the default reduce task number to some
> > number
> > larger than 1,it may get more than 1 reduce results,that is part-00000
> > part-00001  ... ,some this may lead to some problems.
> > I think change the path parameter to a directory the contains the reduce
> > file rather than directly the reduce file is more proper
> > *public Model loadModel(FileSystem fs,* *Path path, Configuration conf)
> > throws IOException *
> >
> > ar set the num of reduce tasks to 1 in the TrainerDrive#runJob()
> > *conf.setNumReduceTasks(1);*
> >
>
> Good catch.  We definitely want more than one reduce task.  I will fix
> that and put in tests for multiple reduces.
>
>
> > **
> > 2.why does the ClassifierDriver class load model data from the HDFS
> > instead
> > of local filesystem?these can avoid copyToLocal command
> >
>
> but don't you just have to do a copyToLocal to get it on the local
> filesystem via bin/hadoop dfs -copyToLocal?  I must admit, it has been a
> while since I have done Hadoop stuff, especially the administrative stuff
> (1+ year).  I guess I was thinking you could load in non-distributed mode
> using the LocalFileSystem.  The code I have is:
>
>      Configuration conf = new JobConf();
>      FileSystem raw = new RawLocalFileSystem();
>      raw.setConf(conf);
>      FileSystem fs = new LocalFileSystem(raw);
>      fs.setConf(conf);
>
>      Path path = new Path(cmdLine.getOptionValue(pathOpt.getOpt()));
>      System.out.println("Loading model from: " + path);
>      Model model = reader.loadModel(fs, path, conf);
>
> Thanks for the feedback,
> Grant
>



-- 
--~--~---------~--~----~------------~-------~--

Best Regards,

Yours
Phonechen

-~----------~----~----~----~------~----~------

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message