mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kris Jack <mrkrisj...@gmail.com>
Subject Re: Setting Number of Mappers and Reducers in DistributedRowMatrix Jobs
Date Tue, 15 Jun 2010 09:28:05 GMT
Hi Sean,

I'm calling getConf() and using it to configure my DistributedRowMatrix.

//
  Configuration originalConf = getConf();
  String inputPathString = originalConf.get("mapred.input.dir");
  String outputTmpPathString = parsedArgs.get("--tempDir");
  int numDocs = Integer.parseInt(parsedArgs.get("--numDocs"));
  int numTerms = Integer.parseInt(parsedArgs.get("--numTerms"));

  DistributedRowMatrix text = new DistributedRowMatrix(new
Path(inputPathString), new Path(outputTmpPathString), numDocs, numTerms);

  text.configure(new JobConf(getConf()));

  DistributedRowMatrix transpose = text.transpose();
//

On debugging, I notice that originalConf object does not have the values
that I sent in through the command line.  When text.transpose() is called,
the transpose job's conf doesn't have the right values for the mappers and
reducers neither.  Where am I supposed to get the command line values to be
used by these jobs?

Thanks,
Kris



2010/6/14 Sean Owen <srowen@gmail.com>

> Looks right to me. My next question is are you calling getConf() to
> get Hadoop's configuration object rather than configuring and setting
> your own? if you did that, you'd lose anything Hadoop parsed from its
> files and command line -- but would explain why re-setting it yourself
> in the code works.
>
> I think we're all on 0.20.2 now, yes.
>
> On Mon, Jun 14, 2010 at 4:52 PM, Kris Jack <mrkrisjack@gmail.com> wrote:
> > Command line call is this -
> >
> > hadoop-0.20 jar mahout-core-0.4-SNAPSHOT.job
> > org.apache.mahout.math.hadoop.GenSimMatrixJob
> > -Dmapred.input.dir=/user/kris/simMatrix/mahoutIndexTFIDF.vec
> > -Dmapred.map.tasks=8 -Dmapred.reduce.tasks=8 --tempDir
> > /tmp/matrixMulitiplication/ --numDocs 12843450 --numTerms 719050
> >
> > org.apache.mahout.math.hadoop.GenSimMatrixJob is my own class that calls
> the
> > matrix transposition and then multiplication.  Is it maybe because I'm
> using
> > hadoop 0.20?
>



-- 
Dr Kris Jack,
http://www.mendeley.com/profiles/kris-jack/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message