Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 15AE3200B6B for ; Fri, 26 Aug 2016 00:11:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 1437F160AC6; Thu, 25 Aug 2016 22:11:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5C623160ABD for ; Fri, 26 Aug 2016 00:11:22 +0200 (CEST) Received: (qmail 29115 invoked by uid 500); 25 Aug 2016 22:11:20 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 28789 invoked by uid 500); 25 Aug 2016 22:11:20 -0000 Delivered-To: apmail-incubator-crunch-dev@incubator.apache.org Received: (qmail 28782 invoked by uid 99); 25 Aug 2016 22:11:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Aug 2016 22:11:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id AC6BB2C0153 for ; Thu, 25 Aug 2016 22:11:20 +0000 (UTC) Date: Thu, 25 Aug 2016 22:11:20 +0000 (UTC) From: "Josh Wills (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CRUNCH-601) Short PCollections in SparkPipeline get length null. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 25 Aug 2016 22:11:23 -0000 [ https://issues.apache.org/jira/browse/CRUNCH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437812#comment-15437812 ] Josh Wills commented on CRUNCH-601: ----------------------------------- Ack, forgot Micah is OOO this week. So I suppose the decision falls to me... > Short PCollections in SparkPipeline get length null. > ---------------------------------------------------- > > Key: CRUNCH-601 > URL: https://issues.apache.org/jira/browse/CRUNCH-601 > Project: Crunch > Issue Type: Bug > Components: Spark > Affects Versions: 0.13.0 > Environment: Running in local mode on Mac as well as in a ubuntu 14.04 docker container > Reporter: Mikael Goldmann > Assignee: Micah Whitacre > Priority: Minor > Attachments: CRUNCH-601-jw.patch, CRUNCH-601.patch, CRUNCH-601b.patch, CRUNCH-601c.patch, SmallCollectionLengthTest.java > > > I'll attach a file with a test that I would expect to pass but which fails. > It creates five PCollection of lengths 0, 1, 2, 3, 4 gets the lengths, runs the pipeline and prints the lengths. Finally it asserts that all lengths are non-null. > I would expect it to print lengths 0, 1, 2, 3, 4 and pass. > What it does is print lengths null, null, null, 3, 4 and fail. > I think the underlying reason is the use of getSize() on an unmaterialized object and assuming that when the estimate that getSize() returns is 0, then the PCollection is guaranteed to be empty, which is false in some cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)