mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Setting Number of Mappers and Reducers in DistributedRowMatrix Jobs
Date Wed, 16 Jun 2010 00:24:35 GMT
On Tue, Jun 15, 2010 at 3:52 AM, Sean Owen <srowen@gmail.com> wrote:

> The first part looks fine.
>
> I dug in, and see that the transpose() method ultimately does not use
> the configuration that was configured and makes its own. That is the
> underlying issue. Maybe Jake can comment more.
>

Yeah, that's probably a mistake, but it's probably because it's tricky:
transpose() calls a map-reduce job, but what if you run that method
inside of a piece of code which first calls another map-reduce job
(with certain command line ops and configuration) and then takes
the result and calls transpose() on it... should you pass in all
the other config from the previous job?  Sometimes that is the
right thing to do, but probably not usually.

In this case, it's probably just a bug in transpose().

I can't look at this right now (about to travel again), but if someone
else sees how to easily just make sure to copy over all old values
from the conf that the DistributedRowMatrix was given, patch-away!
Otherwise I'll try to get to looking at it this weekend or so.

JIRA ticket?

  -jake


>
> On Tue, Jun 15, 2010 at 10:28 AM, Kris Jack <mrkrisjack@gmail.com> wrote:
> > Hi Sean,
> >
> > I'm calling getConf() and using it to configure my DistributedRowMatrix.
> >
> > //
> >  Configuration originalConf = getConf();
> >  String inputPathString = originalConf.get("mapred.input.dir");
> >  String outputTmpPathString = parsedArgs.get("--tempDir");
> >  int numDocs = Integer.parseInt(parsedArgs.get("--numDocs"));
> >  int numTerms = Integer.parseInt(parsedArgs.get("--numTerms"));
> >
> >  DistributedRowMatrix text = new DistributedRowMatrix(new
> > Path(inputPathString), new Path(outputTmpPathString), numDocs, numTerms);
> >
> >  text.configure(new JobConf(getConf()));
> >
> >  DistributedRowMatrix transpose = text.transpose();
> > //
> >
> > On debugging, I notice that originalConf object does not have the values
> > that I sent in through the command line.  When text.transpose() is
> called,
> > the transpose job's conf doesn't have the right values for the mappers
> and
> > reducers neither.  Where am I supposed to get the command line values to
> be
> > used by these jobs?
> >
> > Thanks,
> > Kris
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message