mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-1802) Capture attached checkpoints (if cached)
Date Tue, 08 Mar 2016 23:52:40 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186096#comment-15186096
] 

ASF GitHub Bot commented on MAHOUT-1802:
----------------------------------------

GitHub user andrewpalumbo opened a pull request:

    https://github.com/apache/mahout/pull/185

    MAHOUT-1802: Capture attached checkpoints (if cached)

    Currently, the optimizer generates checkpoints and attaches them to actual logical elements
of the DAG via CheckpointAction$cp. 
    
    ie:
    
    
    ```
    drmC = drmA+ drmB
    
    val cp1 = drmC.checkpoint() // checkpoint
    val cp2 = drmC.checkpoint() // cp2 == cp1
    
    drmD = cp1 + drmE // cp1 + drmE
    ```
    
    but, in:
    `
    drmD = drmC + drmE // computes drmA + drmB + drmC all over`
    
    `drmC` already has` cp1` attached to it so we should assume the common computational path
is the intent here regardless and should be used, instead of building plans that recompute
it. That is,
    
    `drmD = drmC + drmE` should imply `cp1 + drmE `as well even if checkpoint is not used
explicitly.
    
    
    This PR allows us to avoid excessive declarations like 
    
    ```
    drmAcp = drmA.checkpoint
    
    drmB = drmAcp %*%... 
    ```
    
    and instead just use 
    
    ```
    drmA.checkpoint()
    
    drmB = drmA %*% ....
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewpalumbo/mahout MAHOUT-1802

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/mahout/pull/185.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #185
    
----
commit 8e28a6c41061a7f210e69804b977fbf3bff0fcc8
Author: Andrew Palumbo <apalumbo@apache.org>
Date:   2016-03-08T23:04:32Z

    Include CacheHint in CheckpointedDrm trait. Check for a logical CheckpiointAction in physical
translation and use its caching policy for the physical checkpoint

commit 302d34c2b10ff882344e04b9f6f17d4cde3f676f
Author: Andrew Palumbo <apalumbo@apache.org>
Date:   2016-03-08T23:10:38Z

    Merge branch 'master' into MAHOUT-1802

----


>  Capture attached checkpoints (if cached)
> -----------------------------------------
>
>                 Key: MAHOUT-1802
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1802
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.11.1
>            Reporter: Andrew Palumbo
>            Assignee: Andrew Palumbo
>             Fix For: 0.11.2
>
>
> Currently, the optimizer generates checkpoints and attaches them to actual logical elements
of the DAG via CheckpointAction$cp. 
> the way it worsk today is as follows: 
> {code}
> drmC = drmA+ drmB
> val cp1 = drmC.checkpoint() // checkpoint
> val cp2 = drmC.checkpoint() // cp2 == cp1
> drmD = cp1 + drmE // cp1 + drmE
> {code}
> but, in: 
> {code}
> drmD = drmC + drmE // computes drmA + drmB + drmC all over
> {code}
> {{drmC}} already has {{cp1}} attached to it so we should assume the common computational
path is the intent here regardless and should be used, instead of building plans that recompute
it. That is, 
> {{drmD = drmC + drmE}} should imply {{cp1 + drmE}} as well even if checkpoint is not
used explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message