crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-458) Eliminate potentially random MR split-point decisions
Date Thu, 07 Aug 2014 19:00:19 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089612#comment-14089612
] 

Josh Wills commented on CRUNCH-458:
-----------------------------------

It was a case where we were serializing an Avro PTable that had null values using the built-in
Avro Pair schema, which doesn't support null values. That'll be a separate fix though, working
on it now.

> Eliminate potentially random MR split-point decisions
> -----------------------------------------------------
>
>                 Key: CRUNCH-458
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-458
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Josh Wills
>         Attachments: CRUNCH-458.patch
>
>
> I'm running into a pipeline in which the decision of where to split two dependent jobs
seems to be random from run-to-run (I only noticed it b/c one of the runs causes the pipeline
to throw an NPE, and the other does not.) I'd like to investigate this and try to eliminate
any potential sources of randomness in the way that two dependent GBK operations are split.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message