Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D6DFCC368 for ; Tue, 12 Aug 2014 04:49:12 +0000 (UTC) Received: (qmail 87475 invoked by uid 500); 12 Aug 2014 04:49:12 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 87438 invoked by uid 500); 12 Aug 2014 04:49:12 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 87136 invoked by uid 500); 12 Aug 2014 04:49:12 -0000 Delivered-To: apmail-incubator-crunch-dev@incubator.apache.org Received: (qmail 87064 invoked by uid 99); 12 Aug 2014 04:49:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Aug 2014 04:49:12 +0000 Date: Tue, 12 Aug 2014 04:49:12 +0000 (UTC) From: "Josh Wills (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CRUNCH-458) Eliminate potentially random MR split-point decisions MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CRUNCH-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Wills updated CRUNCH-458: ------------------------------ Attachment: CRUNCH-458b.patch @Gabriel I couldn't think of a better approach, either. ;-) > Eliminate potentially random MR split-point decisions > ----------------------------------------------------- > > Key: CRUNCH-458 > URL: https://issues.apache.org/jira/browse/CRUNCH-458 > Project: Crunch > Issue Type: Bug > Reporter: Josh Wills > Attachments: CRUNCH-458.patch, CRUNCH-458b.patch > > > I'm running into a pipeline in which the decision of where to split two dependent jobs seems to be random from run-to-run (I only noticed it b/c one of the runs causes the pipeline to throw an NPE, and the other does not.) I'd like to investigate this and try to eliminate any potential sources of randomness in the way that two dependent GBK operations are split. -- This message was sent by Atlassian JIRA (v6.2#6252)