mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <jeast...@Narus.com>
Subject RE: where i can set -Dmapred.map.tasks=X
Date Wed, 29 Dec 2010 01:19:18 GMT
That's where I'm beginning to look too. It seems the driver code is working correctly (I thought
I had tested that) but the CLI isn't.

The original post was for -Dmapred.map.tasks but I noticed the reduce.tasks didn't work either.

-----Original Message-----
From: Dmitriy Lyubimov [mailto:dlieu.7@gmail.com] 
Sent: Tuesday, December 28, 2010 5:15 PM
To: dev@mahout.apache.org
Subject: Re: where i can set -Dmapred.map.tasks=X

Oh, so you are trying to set number of reduce tasks. i missed that, original
post was about # of map tasks. sorry.

No, no idea why that error pops up in mahout command line. i would need to
dig into the mahout's cli code -- i don't thing i dug that deep there
before.

On Tue, Dec 28, 2010 at 5:06 PM, Jeff Eastman <jeastman@narus.com> wrote:

> It's very odd: when I run k-means from Eclipse and add
> -Dmapred.reduce.tasks=10 as the first argument the driver loves it and
> job.getNumReduceTasks() is set correctly to 10. When I run the same command
> line using bin/mahout; however, it fails:  with "Unexpected
> -Dmapred.reduce.tasks=10 while processing Job-Specific Options.
>
> The CLI invocation is: ./bin/mahout kmeans -Dmapred.reduce.tasks-10 -I ...
>
>
>
> -----Original Message-----
> From: Dmitriy Lyubimov [mailto:dlieu.7@gmail.com]
> Sent: Tuesday, December 28, 2010 4:55 PM
> To: dev@mahout.apache.org
> Subject: Re: where i can set -Dmapred.map.tasks=X
>
> PPS it doesn't tell you what InputFileFormat actually uses for it as a
> property, and i don't remember on top of my head either. but i assume you
> could use them with -D as well.
>
> On Tue, Dec 28, 2010 at 4:54 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> wrote:
>
> > In particular, QJob is one of the drivers that uses that , in the
> following
> > way:
> >
> > f ( minSplitSize>0)
> >  SequenceFileInputFormat.setMinInputSplitSize(job, minSplitSize);
> >
> > Interestng pecularity about that parameter is that in the current hadoop
> > release for anything derived from InputFileFormat it ensures that all
> splits
> > are at least that big and the last split is at least times 1.1  that big.
> I
> > am not quite sure why special treatment for the last split but that's how
> it
> > goes there.
> >
> > -Dmitriy
> >
> >
> > On Tue, Dec 28, 2010 at 4:48 PM, Dmitriy Lyubimov <dlieu.7@gmail.com
> >wrote:
> >
> >> Jeff,
> >>
> >> it's mahout-376 patch i don't think it is committed. the driver class
> >> there is SSVDCli, for your convenience you can find it here :
> >>
> https://github.com/dlyubimov/ssvd-lsi/tree/givens-ssvd/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd
> >>
> >> but like i said, i did not try to use it with -D option since i wanted
> to
> >> give an explicit option to increase split size if needed (and a help for
> >> it). Another reason is that solver has a series of jobs and only those
> >> reading the source matrix have anything to do with the split size.
> >>
> >>
> >> -d
> >>
> >>
> >> On Tue, Dec 28, 2010 at 4:39 PM, Jeff Eastman <jeastman@narus.com>
> wrote:
> >>
> >>> What's the driver class? If the -D parameters are working for you I
> want
> >>> to compare to the clustering drovers
> >>>
> >>> -----Original Message-----
> >>> From: Dmitriy Lyubimov [mailto:dlieu.7@gmail.com]
> >>> Sent: Tuesday, December 28, 2010 4:37 PM
> >>> To: dev@mahout.apache.org
> >>> Subject: Re: where i can set -Dmapred.map.tasks=X
> >>>
> >>> as far as i understand, this option is not forced. I suspect it
> actually
> >>> means 'minimum degree of parallelism'. so if you expect to use that to
> >>> reduce number of mappers, i don't think this is expected to work so
> much.
> >>> The one that do enforce anything are min split size and max split size
> in
> >>> file input so i guess you can try those. I rely on them (and open it up
> >>> as a
> >>> job-specific option) in stochastic svd.
> >>>
> >>> but usually forcing split size to increase creates a 'superslits'
> >>> problem,
> >>> where a lot of data is moved around to just supply data to mappers.
> which
> >>> is
> >>> perhaps why this option is meant to increase parallelism only, but
> >>> probably
> >>> not to decrease it.
> >>>
> >>> -d
> >>>
> >>> On Tue, Dec 28, 2010 at 4:05 PM, Jeff Eastman <jeastman@narus.com>
> >>> wrote:
> >>>
> >>> > This is supposed to be a generic option. You should be able to
> specify
> >>> > Hadoop options such as this on the command line invocation of your
> >>> favorite
> >>> > Mahout routine, but I'm having a similar problem setting
> >>> > -Dmapred.reduce.tasks=10 with Canopy and k-Means. This is both with
> and
> >>> > without a space after the -D.
> >>> >
> >>> > Can someone point me to a Mahout command where this does work? Both
> >>> drivers
> >>> > extend AbstractJob and do the usual option processing pushups. I
> don't
> >>> have
> >>> > Hadoop source locally so I can't debug the generic options parsing.
> >>> >
> >>> > -----Original Message-----
> >>> > From: beneo_7 [mailto:beneo_7@163.com]
> >>> > Sent: Monday, December 27, 2010 10:45 PM
> >>> > To: dev@mahout.apache.org
> >>> > Subject: where i can set -Dmapred.map.tasks=X
> >>> >
> >>> > i read onMahout in Action that I should set -Dmapred.map.tasks=X
> >>> > but it did not work for hadoop
> >>> >
> >>>
> >>
> >>
> >
>

Mime
View raw message