crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-458) Eliminate potentially random MR split-point decisions
Date Sat, 09 Aug 2014 19:05:12 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091860#comment-14091860
] 

Gabriel Reid commented on CRUNCH-458:
-------------------------------------

I'm wondering if basing the comparator on something as simple as the toString() of the NodePath
and PCollectionImpl would be good enough (but using the real equals to determine equality).
In a way that seems too naive to work, but on the other hand I can't see any immediate reason
why it wouldn't work.

> Eliminate potentially random MR split-point decisions
> -----------------------------------------------------
>
>                 Key: CRUNCH-458
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-458
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Josh Wills
>         Attachments: CRUNCH-458.patch
>
>
> I'm running into a pipeline in which the decision of where to split two dependent jobs
seems to be random from run-to-run (I only noticed it b/c one of the runs causes the pipeline
to throw an NPE, and the other does not.) I'd like to investigate this and try to eliminate
any potential sources of randomness in the way that two dependent GBK operations are split.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message