pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Dai <da...@hortonworks.com>
Subject Re: Pig9 will fail on bad schema specification, but in a difficult to debug way
Date Thu, 01 Dec 2011 20:06:01 GMT
Why the problem not exist in Pig 8?

Daniel

On Tue, Nov 29, 2011 at 10:22 PM, Jonathan Coveney <jcoveney@gmail.com>wrote:

> In pig9, if you have a UDF which specifies its outputschema and that output
> schema is wrong, then you with high probability will get an exception such
> as:
>
> java.lang.ClassCastException: java.lang.Long cannot be cast to
> java.lang.Integer
>        at java.lang.Integer.compareTo(Integer.java:37)
>
> Errors like this are rare, but didn't seem to come up in Pig8, but do
> in Pig9 and the opaque error messages can be hard to read.
>
> In this case, there was a UDF that said it was outputting a Long, but
> was in fact outputting an Int. At some point, it tried to cast it over
> and failed.
>
> That said, I wonder if it might be possible to add a runtime check
> that checks the output of say the first output of your EvalFunc, and
> if the type does not match up with the declared OutputSchema, it will
> give you a warning (I don't think it should fail, but it should at
> least warn you to aid in debugging). I don't think this would be too
> hard and would add minimal overhead (compared to the run time of a
> job). We could optionally add a flag or something for a "strict" mode
> viz. schema.
>
> Related to this, when jobs die in opaque ways, I wonder if there might
> be a way to give a clearer sense of where in the pipeline it dies? You
> can check pig.alias and try to figure it out by where in the map or
> reduce it was, but that's tough. I know that pipelining and
> optimizations could make this tough, but having a clearer sense of
> what's going on would help debugging along.
>
> Thoughts?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message