mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-1573) More explicit parallelism adjustments in math-scala DRM apis; elements of automatic re-adjustments
Date Wed, 18 Jun 2014 05:45:13 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034845#comment-14034845
] 

ASF GitHub Bot commented on MAHOUT-1573:
----------------------------------------

Github user dlyubimov commented on the pull request:

    https://github.com/apache/mahout/pull/13#issuecomment-46398143
  
    Ted, are you ready to help with a concrete alternative? This is a very
    small issue compared to even the patch, lets build a list of alternatives
    and vote. But lets get it done
    
    My additional variants
    
    minSplits,...
    minPar, exactPar, autoPar (consitent with scala's collection.par())
    
    To give something to vote down for Ted
    >=|| :=||
    :||=
    
    Not ok with me
    
    minParts
    minParallelism
    minPartitions
    repartition
    reshuffle
    and other do-something kind
    
    Your variants--?
    On Jun 17, 2014 9:59 PM, "Ted Dunning" <notifications@github.com> wrote:
    
    > Yes.
    >
    > But I was talking about the gratuitous use of non-alpha characters.
    > Excessive use of operator overloading is also a bit of a problem.
    >
    > Just because you can doesn't mean you should.
    >
    > Sent from my iPhone
    >
    > > On Jun 17, 2014, at 17:58, Dmitriy Lyubimov <notifications@github.com>
    > wrote:
    > >
    > > e.g. one can write things like A.t.%*%(A).exact_||(100)
    > >
    > > —
    > > Reply to this email directly or view it on GitHub.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/mahout/pull/13#issuecomment-46396097>.
    >


> More explicit parallelism adjustments in math-scala DRM apis; elements of automatic re-adjustments
> --------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1573
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1573
>             Project: Mahout
>          Issue Type: Task
>    Affects Versions: 0.9
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 1.0
>
>
> (1) add minSplit parameter pass-thru to drmFromHDFS to be able to explicitly increase
parallelism. 
> (2) add parrallelism readjustment parameter to a checkpoint() call. This implies shuffle-less
coalesce() translation to the data set before it is requested to be cached (if specified).
> Going forward, we probably should try and figure how we can automate it,  at least a
little bit. For example, the simplest automatic adjustment might include re-adjust parallelims
on load to simply fit cluster size (95% or 180% of cluster size, for example), with some rule-of-thumb
safeguards here, e.g. we cannot exceed a factor of say 8 (or whatever we configure) in splitting
each original hdfs split. We should be able to get a reasonable parallelism performance out
of the box on simple heuristics like that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message