mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: where i can set -Dmapred.map.tasks=X
Date Wed, 29 Dec 2010 00:48:57 GMT
Jeff,

it's mahout-376 patch i don't think it is committed. the driver class there
is SSVDCli, for your convenience you can find it here :
https://github.com/dlyubimov/ssvd-lsi/tree/givens-ssvd/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd

but like i said, i did not try to use it with -D option since i wanted to
give an explicit option to increase split size if needed (and a help for
it). Another reason is that solver has a series of jobs and only those
reading the source matrix have anything to do with the split size.


-d

On Tue, Dec 28, 2010 at 4:39 PM, Jeff Eastman <jeastman@narus.com> wrote:

> What's the driver class? If the -D parameters are working for you I want to
> compare to the clustering drovers
>
> -----Original Message-----
> From: Dmitriy Lyubimov [mailto:dlieu.7@gmail.com]
> Sent: Tuesday, December 28, 2010 4:37 PM
> To: dev@mahout.apache.org
> Subject: Re: where i can set -Dmapred.map.tasks=X
>
> as far as i understand, this option is not forced. I suspect it actually
> means 'minimum degree of parallelism'. so if you expect to use that to
> reduce number of mappers, i don't think this is expected to work so much.
> The one that do enforce anything are min split size and max split size in
> file input so i guess you can try those. I rely on them (and open it up as
> a
> job-specific option) in stochastic svd.
>
> but usually forcing split size to increase creates a 'superslits' problem,
> where a lot of data is moved around to just supply data to mappers. which
> is
> perhaps why this option is meant to increase parallelism only, but probably
> not to decrease it.
>
> -d
>
> On Tue, Dec 28, 2010 at 4:05 PM, Jeff Eastman <jeastman@narus.com> wrote:
>
> > This is supposed to be a generic option. You should be able to specify
> > Hadoop options such as this on the command line invocation of your
> favorite
> > Mahout routine, but I'm having a similar problem setting
> > -Dmapred.reduce.tasks=10 with Canopy and k-Means. This is both with and
> > without a space after the -D.
> >
> > Can someone point me to a Mahout command where this does work? Both
> drivers
> > extend AbstractJob and do the usual option processing pushups. I don't
> have
> > Hadoop source locally so I can't debug the generic options parsing.
> >
> > -----Original Message-----
> > From: beneo_7 [mailto:beneo_7@163.com]
> > Sent: Monday, December 27, 2010 10:45 PM
> > To: dev@mahout.apache.org
> > Subject: where i can set -Dmapred.map.tasks=X
> >
> > i read onMahout in Action that I should set -Dmapred.map.tasks=X
> > but it did not work for hadoop
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message