Return-Path: X-Original-To: apmail-pig-dev-archive@www.apache.org Delivered-To: apmail-pig-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D1F269117 for ; Fri, 2 Dec 2011 18:56:40 +0000 (UTC) Received: (qmail 76451 invoked by uid 500); 2 Dec 2011 18:56:40 -0000 Delivered-To: apmail-pig-dev-archive@pig.apache.org Received: (qmail 76367 invoked by uid 500); 2 Dec 2011 18:56:40 -0000 Mailing-List: contact dev-help@pig.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pig.apache.org Delivered-To: mailing list dev@pig.apache.org Received: (qmail 76359 invoked by uid 99); 2 Dec 2011 18:56:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Dec 2011 18:56:40 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jcoveney@gmail.com designates 209.85.160.177 as permitted sender) Received: from [209.85.160.177] (HELO mail-gy0-f177.google.com) (209.85.160.177) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Dec 2011 18:56:33 +0000 Received: by ghrr19 with SMTP id r19so4167643ghr.22 for ; Fri, 02 Dec 2011 10:56:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=uwhIbIfpLEeRbSgpCXixCEhY8qHaCZhxnUfw4+iUzaI=; b=KVWgBudrpZa+7SdOKei4N6AF0NZoxCZK1ghcGk6Hrd5taI8uzgtZ0TbmvbyC+UxAiQ JpoOIZHOs1FVhPbWiJsTzZpLolSpLa7f7YueL3LJAY13EQbmapIyZz4319cwG+OB1RB9 qSwrFg80w+wN1VOcU+/Q/kaqHgYQouvAgReWg= MIME-Version: 1.0 Received: by 10.50.237.5 with SMTP id uy5mr14879430igc.50.1322852172714; Fri, 02 Dec 2011 10:56:12 -0800 (PST) Received: by 10.231.193.6 with HTTP; Fri, 2 Dec 2011 10:56:12 -0800 (PST) In-Reply-To: References: Date: Fri, 2 Dec 2011 10:56:12 -0800 Message-ID: Subject: Re: Pig9 will fail on bad schema specification, but in a difficult to debug way From: Jonathan Coveney To: dev@pig.apache.org Content-Type: multipart/alternative; boundary=f46d044787fbd8517804b3208582 X-Virus-Checked: Checked by ClamAV on apache.org --f46d044787fbd8517804b3208582 Content-Type: text/plain; charset=UTF-8 Hmm, I tested it and it does exist in pig8. I must have been running a fixed version. I think the other point stands though...we can make it easier to understand these sorts of problems. 2011/12/1 Daniel Dai > Why the problem not exist in Pig 8? > > Daniel > > On Tue, Nov 29, 2011 at 10:22 PM, Jonathan Coveney >wrote: > > > In pig9, if you have a UDF which specifies its outputschema and that > output > > schema is wrong, then you with high probability will get an exception > such > > as: > > > > java.lang.ClassCastException: java.lang.Long cannot be cast to > > java.lang.Integer > > at java.lang.Integer.compareTo(Integer.java:37) > > > > Errors like this are rare, but didn't seem to come up in Pig8, but do > > in Pig9 and the opaque error messages can be hard to read. > > > > In this case, there was a UDF that said it was outputting a Long, but > > was in fact outputting an Int. At some point, it tried to cast it over > > and failed. > > > > That said, I wonder if it might be possible to add a runtime check > > that checks the output of say the first output of your EvalFunc, and > > if the type does not match up with the declared OutputSchema, it will > > give you a warning (I don't think it should fail, but it should at > > least warn you to aid in debugging). I don't think this would be too > > hard and would add minimal overhead (compared to the run time of a > > job). We could optionally add a flag or something for a "strict" mode > > viz. schema. > > > > Related to this, when jobs die in opaque ways, I wonder if there might > > be a way to give a clearer sense of where in the pipeline it dies? You > > can check pig.alias and try to figure it out by where in the map or > > reduce it was, but that's tough. I know that pipelining and > > optimizations could make this tough, but having a clearer sense of > > what's going on would help debugging along. > > > > Thoughts? > > > --f46d044787fbd8517804b3208582--