hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1420) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple
Date Tue, 17 Aug 2010 19:24:27 GMT

    [ https://issues.apache.org/jira/browse/PIG-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899537#action_12899537
] 

Olga Natkovich commented on PIG-1420:
-------------------------------------

I could not figure out how to re-open this issue. However, the code does not work in pig script.
The main reason is that the code that selects which function to use does not deal yet with
non-fixed number of arguments. 

grunt> A = load 'studentab10k' as (name: chararray, age: chararray, gpa: chararray);
grunt> B = foreach A generate CONCAT(name, age, gpa);
grunt> C = limit B 10;
grunt> dump C;
2010-08-17 12:17:41,635 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could
not infer the matching function for org.apache.pig.builtin.CONCAT as multiple or none of them
fit. Please use an explicit cast.
Details at logfile: /homes/olgan/pig_1282072550328.log
grunt>


> Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple
> -----------------------------------------------------------------------------------------
>
>                 Key: PIG-1420
>                 URL: https://issues.apache.org/jira/browse/PIG-1420
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Russell Jurney
>            Assignee: Russell Jurney
>             Fix For: 0.8.0
>
>         Attachments: addconcat2.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> org.apache.pig.builtin.CONCAT (which acts on DataByteArray's internally) and org.apache.pig.builtin.StringConcat
(which acts on Strings internally), both act on the first two fields of a tuple.  This results
in ugly nested CONCAT calls like:
> CONCAT(CONCAT(A, ' '), B)
> The more desirable form is:
> CONCAT(A, ' ', B)
> This change will be backwards compatible, provided that no one was relying on the fact
that CONCAT ignores fields after the first two in a tuple.  This seems a reasonable assumption
to make, or at least a small break in compatibility for a sizable improvement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message