crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <>
Subject [jira] [Updated] (CRUNCH-458) Eliminate potentially random MR split-point decisions
Date Thu, 07 Aug 2014 01:15:11 GMT


Josh Wills updated CRUNCH-458:

    Attachment: CRUNCH-458.patch

First cut at this: using LinkedHashSet and LinkedHashMap inside of Edge to ensure that we
always process the node paths/PCollections in the same order when making split decisions.

> Eliminate potentially random MR split-point decisions
> -----------------------------------------------------
>                 Key: CRUNCH-458
>                 URL:
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Josh Wills
>         Attachments: CRUNCH-458.patch
> I'm running into a pipeline in which the decision of where to split two dependent jobs
seems to be random from run-to-run (I only noticed it b/c one of the runs causes the pipeline
to throw an NPE, and the other does not.) I'd like to investigate this and try to eliminate
any potential sources of randomness in the way that two dependent GBK operations are split.

This message was sent by Atlassian JIRA

View raw message