Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BC6C710893 for ; Thu, 20 Mar 2014 19:27:43 +0000 (UTC) Received: (qmail 12134 invoked by uid 500); 20 Mar 2014 19:27:42 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 12085 invoked by uid 500); 20 Mar 2014 19:27:42 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 12077 invoked by uid 99); 20 Mar 2014 19:27:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Mar 2014 19:27:42 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jinalshah2007@gmail.com designates 74.125.82.172 as permitted sender) Received: from [74.125.82.172] (HELO mail-we0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Mar 2014 19:27:38 +0000 Received: by mail-we0-f172.google.com with SMTP id t61so947265wes.31 for ; Thu, 20 Mar 2014 12:27:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=moFd8lJdhh9cH4ajr87MVZ3dqCy1WYsUBgzCldOS6iw=; b=rav+mQxcSc1BMC/br/g5urkrJcW1g70+jMBPGfAqKW0gZGFXadY75oFnxEWXLlYdDC 5nO+rh895JcgSvvg4uM2gKxv3dkH9WcaHcw19mr+0Ef5auVgaaZ4cokbKdeNShjwVLzH gzQ2KMIkC2kYZLsdlbHhzQbJ9upnn4OKSDuAsvEEtSA9ZlQkWl1flWp7lM3ZpcgMJrQF 4mkq7XYawCypKHU5D+8BZ2Rgg1Dk7kYNSRmqVR95bhIzKr3BmSdxRbP/90PO6AyxYD4T 9heqYc/y7IDNB1t1El9hajdGur7h09yabhWCdCfT8g2GNFfHMBf/ORYRZ5Eu5xWrG71b iywg== MIME-Version: 1.0 X-Received: by 10.181.11.169 with SMTP id ej9mr4910592wid.18.1395343637220; Thu, 20 Mar 2014 12:27:17 -0700 (PDT) Received: by 10.216.182.4 with HTTP; Thu, 20 Mar 2014 12:27:17 -0700 (PDT) In-Reply-To: References: Date: Thu, 20 Mar 2014 14:27:17 -0500 Message-ID: Subject: Re: Pipeline throwing No Output? exception From: Jinal Shah To: dev@crunch.apache.org Content-Type: multipart/alternative; boundary=f46d043be1fcd61cad04f50ec1d3 X-Virus-Checked: Checked by ClamAV on apache.org --f46d043be1fcd61cad04f50ec1d3 Content-Type: text/plain; charset=ISO-8859-1 Sorry Micah U and V are totally different Types. Just wanted to clarify it. On Thu, Mar 20, 2014 at 2:00 PM, Jinal Shah wrote: > Hey Micah, > > Yes you are right and this is what is going on in that // do something. > (Higher Level overview) > > Here U and V are same > > PCollection collectionWhichCouldBeEmpty = null; > if(path.exists){ > collectionWhichCouldBeEmpty= pipeline.read(FromPath, PType.V); > } else{ > collectionWhichCouldBeEmpty = pipeline.emptyPCollection(); > } > > PCollection collectionWhichHasData = DataComingFromDifferentSource(); > > PTable VTable = collectionWhichCouldBeEmpty.by(PType); > > PTable UTable = collectionWhichHasData.by(PType); > > UVTable = Join.join(UTable, VTable, Join.LEFT); > > pipeline.write(UVTable.values(), somePath, PType); > > pipeline.run() // error is here > > > Hope this helps. > > > > > On Thu, Mar 20, 2014 at 12:50 PM, Micah Whitacre wrote: > >> Jinal can you elaborate on the "//do something" section of the code? I >> thought when I heard it described other PCollections were being joined >> with >> the emptyPCollection and it was the outcome of the joins and additional >> processing that was actually being persisted. >> >> >> On Thu, Mar 20, 2014 at 10:18 AM, Chao Shi wrote: >> >> > Hi Josh and Jinal, >> > >> > This was introduced to help the following case: In one of our MR >> programs, >> > there is a command line option that one can optionally specify a path to >> > data to be joined on. Before introducing emptyCollection(), we have to >> do >> > like this: >> > >> > Path path = ... >> > PCollection in1 = null; >> > if (path != null) { >> > in = pipeline.read(...); >> > } >> > PCollection in2 = pipeline.read(...); >> > if (in1 != null) { >> > in2 = in2.join(in1); >> > } >> > >> > You can see checks for null everywhere. With emptyPColleciton, we can do >> > this: >> > >> > if (path != null) { >> > in2 = pipeline.read(); >> > } else { >> > in2 = emptyPCollection(); >> > } >> > in1.join(in2) >> > >> > I think Jinal's case should be a bad case for our current >> implementation. >> > Perhaps we should change it to create an empty output directory rather >> than >> > report an error, which doesn't start the MR and can save the job >> start-up >> > time. This is the benefit for knowing PCollection in plan-time. >> > >> > >> > 2014-03-16 23:34 GMT+08:00 Josh Wills : >> > >> > > +chao >> > > >> > > Inlined. >> > > >> > > On Sat, Mar 15, 2014 at 12:34 PM, Jinal Shah > > >wrote: >> > > >> > >> Hi, >> > >> >> > >> I actually came across a particular case I'm not sure whether the >> > behavior >> > >> is right or not. So here is what is happening I am getting No Output >> > >> exception throwing while trying to run my Crunch job. On further >> > >> investigating I found that I was using Pipeline.emptyCollection(). So >> > here >> > >> is how my scenario looks like >> > >> >> > >> PCollection collectionWhichCouldBeEmpty = null; >> > >> if(path.exists){ >> > >> collectionWhichCouldBeEmpty= pipeline.read(FromPath, PType.V); >> > >> } else{ >> > >> collectionWhichCouldBeEmpty = pipeline.emptyPCollection(); >> > >> } >> > >> >> > >> //do some operations >> > >> >> > >> pipeline.write(Target); >> > >> >> > >> pipeline.run()// this is where it is throwing the error >> > >> >> > >> >> > >> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/impl/mr/plan/MSCRPlanner.java#L287 >> > >> >> > >> On further debugging I found that the Vertex didn't have an input. >> > >> >> > >> >> > >> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/impl/mr/plan/MSCRPlanner.java#L275 >> > >> >> > >> >> > >> So If I use the pipeline.read and it creates an Empty PCollection it >> > works >> > >> since it has the input source but If I create an Empty PCollection >> using >> > >> the pipeline.emptyPCollection which doesn't have an input source >> then it >> > >> fails >> > >> >> > >> Not sure if the case is missed or it has to be like this. >> > >> >> > > >> > > It's a good question, and I'm not sure of the answer. Added Chao to >> the >> > > To: line to ask him what the intention was in this case. >> > > >> > > >> > >> Thanks >> > >> Jinal >> > >> >> > > >> > > >> > >> > > --f46d043be1fcd61cad04f50ec1d3--