pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Plush (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2134) ReadScalars message "scalar has more than one row in the output" does not provide enough information to help programmer find and fix script syntax error.
Date Mon, 17 Oct 2011 19:33:10 GMT

    [ https://issues.apache.org/jira/browse/PIG-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129110#comment-13129110
] 

Jim Plush commented on PIG-2134:
--------------------------------

This just bit me today for the exact reason you describe. Luckly your bug came up and I was
able to resolve my dumb mistake, would have helped to have an error message like that though
:)
                
> ReadScalars message "scalar has more than one row in the output" does not provide enough
information to help programmer find and fix script syntax error.
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-2134
>                 URL: https://issues.apache.org/jira/browse/PIG-2134
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Michael Brauwerman
>            Priority: Trivial
>
> (Bug filed on 0.8. I do not have 0.9 to test.)
> This applies to org.apache.pig.impl.builtin.ReadScalars.java:83
> http://search-hadoop.com/c/Pig:/src/org/apache/pig/impl/builtin/ReadScalars.java
> I have bitten myself with the same programming error several times, and each time I spent
too long diagnosing my error.
> The error message "scalar has more than one row in the output" is a bit misleading, considering
the underlying programming mistake.
> Consider this Pig script:
> A = LOAD 'a' as (key, a1, a_junk);
> B = LOAD 'b' as (key, b1, b_junk);
> C = join A by key, B by key;
> -- Now, we want to project (key, a1, b1)
> -- CORRECT:
> D_GOOD = foreach C generate A::key, a1, b1;  -- Disambiguate 'key' correctly.
> -- Now, consider some common programmer errors:
> -- INCORRECT:
> -- This fails, because 'key' is ambiguous. The error message is clear enough.
> D_BAD_1 = foreach C generate key, a1, b1;  
> -- This fails whenever A has multiple rows.
> D_BAD_2 = foreach C generate A.key, a1, b1
> -- Error: "Scalar has more than one row in the output 1st : t1, 2nd : t2"
> That's non-illuminating, for the following reason:
> The error message is assuming that the programmer is making a semantic error, trying
to use a value from the original A, which is impossible if A has more than one row. 
> In actuality, the programmer wants A::key, but he made a syntax error by typing "A.key",
and it resulted in "scalar has more than one row" message that has nothing to do with what
he intended. 
> Since he has confused "." and "::", he has no context for interpreting the message properly.
> Ideally, the error message would say something like this:
>  "A.key cannot be used as scalar here, because A has more than one row. Did you mean
A::key?"
> If the identifiers are not available at error-logging time, something like this would
be helpful:
>  "Relation cannot be used as scalar here, because A has more than one row. Did you mean
to use '::' instead of '.'? "

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message