Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3A6BF200B6B for ; Fri, 26 Aug 2016 07:47:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 38C40160ABE; Fri, 26 Aug 2016 05:47:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 81B4F160AA5 for ; Fri, 26 Aug 2016 07:47:21 +0200 (CEST) Received: (qmail 90788 invoked by uid 500); 26 Aug 2016 05:47:20 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 90759 invoked by uid 500); 26 Aug 2016 05:47:20 -0000 Delivered-To: apmail-incubator-crunch-dev@incubator.apache.org Received: (qmail 90755 invoked by uid 99); 26 Aug 2016 05:47:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Aug 2016 05:47:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 771E52C0151 for ; Fri, 26 Aug 2016 05:47:20 +0000 (UTC) Date: Fri, 26 Aug 2016 05:47:20 +0000 (UTC) From: "Josh Wills (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CRUNCH-601) Short PCollections in SparkPipeline get length null. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 26 Aug 2016 05:47:22 -0000 [ https://issues.apache.org/jira/browse/CRUNCH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Wills updated CRUNCH-601: ------------------------------ Attachment: CRUNCH-601d.patch Okay, this is the final version I'm going with for this. Thank you all for the help! > Short PCollections in SparkPipeline get length null. > ---------------------------------------------------- > > Key: CRUNCH-601 > URL: https://issues.apache.org/jira/browse/CRUNCH-601 > Project: Crunch > Issue Type: Bug > Components: Spark > Affects Versions: 0.13.0 > Environment: Running in local mode on Mac as well as in a ubuntu 14.04 docker container > Reporter: Mikael Goldmann > Assignee: Micah Whitacre > Priority: Minor > Attachments: CRUNCH-601-jw.patch, CRUNCH-601.patch, CRUNCH-601b.patch, CRUNCH-601c.patch, CRUNCH-601d.patch, SmallCollectionLengthTest.java > > > I'll attach a file with a test that I would expect to pass but which fails. > It creates five PCollection of lengths 0, 1, 2, 3, 4 gets the lengths, runs the pipeline and prints the lengths. Finally it asserts that all lengths are non-null. > I would expect it to print lengths 0, 1, 2, 3, 4 and pass. > What it does is print lengths null, null, null, 3, 4 and fail. > I think the underlying reason is the use of getSize() on an unmaterialized object and assuming that when the estimate that getSize() returns is 0, then the PCollection is guaranteed to be empty, which is false in some cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)