datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eyal Allweil (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DATAFU-83) InUDF does not validate that types are compatible
Date Mon, 31 Jul 2017 07:13:00 GMT

    [ https://issues.apache.org/jira/browse/DATAFU-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16106894#comment-16106894
] 

Eyal Allweil commented on DATAFU-83:
------------------------------------

Hi Kyle ([~ItsAUsernameRight?])

Your help is very welcome. I have two comments about the state of the contribution - I'll
put them both here and in the review board for maximum visibility.

1. I think the output schema of this UDF is always boolean, not the schema of the first input
field. I would make the outputSchema method identical to that in an existing Boolean UDF -
for example, [Pig's ENDSWITH built-in function|https://github.com/apache/pig/blob/trunk/src/org/apache/pig/builtin/ENDSWITH.java#L62]

2. As Matthew already wrote in the review board, adding a case to the unit test is a good
idea - you can probably just duplicate something from [the existing test|https://github.com/apache/incubator-datafu/blob/master/datafu-pig/src/test/java/datafu/test/pig/util/InTests.java].

Thanks!

> InUDF does not validate that types are compatible
> -------------------------------------------------
>
>                 Key: DATAFU-83
>                 URL: https://issues.apache.org/jira/browse/DATAFU-83
>             Project: DataFu
>          Issue Type: Improvement
>            Reporter: Matthew Hayes
>            Priority: Minor
>         Attachments: DATAFU-83.patch, rb36702.patch
>
>
> See the example below.  The input data is a long, but ints are provided to match against.
 Because it uses the Java equals to compare and these are different types, this will never
match, which can lead to confusing results.  I believe it should at least throw an error.
> {code}
>   define I datafu.pig.util.InUDF();
>   
>   data = LOAD 'input' AS (B: bag {T: tuple(v:LONG)});
>   
>   data2 = FOREACH data {
>     C = FILTER B By I(v, 1,2,3);
>     GENERATE C;
>   }
>   
>   describe data2;
>   
>   STORE data2 INTO 'output';
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message