crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-458) Eliminate potentially random MR split-point decisions
Date Thu, 07 Aug 2014 18:58:12 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089608#comment-14089608
] 

Gabriel Reid commented on CRUNCH-458:
-------------------------------------

Definitely sounds like a good plan to make the planning stuff deterministic. I was thinking
that it might be a bit better to use a TreeSet, etc instead of LinkedHashSet so that the behavior
will be the same regardless of the order in which node paths are added, so it would protect
against calling code using a HashSet somewhere. On the other hand, that might be over-thinking
it, and it means we would need to have Comparators for PCollections and NodePaths. Anyhow,
just something to consider.

I'm curious, what was the NPE that you were getting when an alternate plan was being created?
Was that something in your own code, or in Crunch?

> Eliminate potentially random MR split-point decisions
> -----------------------------------------------------
>
>                 Key: CRUNCH-458
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-458
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Josh Wills
>         Attachments: CRUNCH-458.patch
>
>
> I'm running into a pipeline in which the decision of where to split two dependent jobs
seems to be random from run-to-run (I only noticed it b/c one of the runs causes the pipeline
to throw an NPE, and the other does not.) I'd like to investigate this and try to eliminate
any potential sources of randomness in the way that two dependent GBK operations are split.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message