pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "karan kumar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-3870) STRSPLITTOBAG UDF
Date Fri, 05 Sep 2014 14:01:28 GMT

    [ https://issues.apache.org/jira/browse/PIG-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14122973#comment-14122973
] 

karan kumar commented on PIG-3870:
----------------------------------

[~daijy] I would like to work on this jira. Please assign it to me.

> STRSPLITTOBAG UDF
> -----------------
>
>                 Key: PIG-3870
>                 URL: https://issues.apache.org/jira/browse/PIG-3870
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>            Reporter: Praveenesh Kumar
>             Fix For: 0.12.0
>
>
> I had a scenario, which required me to change the STRSPLIT code. The scenario was as
follows:
> I have a data like:
> 1       A|1|1   some
> 2       B|2|2   data
> 3       C|3|3   hadoop
> Need output like this :
> 1    A    some
> 1    1    some
> 1    1    some
> 2    B    data
> 2    2     data
> 2    2     data
> 3    C    hadoop
> 3    3    hadoop
> 3    3    hadoop
> I was trying to use STRSPLIT($1,'\\\|') which was returning a tuple, If I do flatten
on it, it converts the data into columns.
> If we return a bag of tuples, we can easily use flatten() to convert it into rows, plus
can also convert that into Tuple using TOTUPLE() UDF (if someone just want to use it as tuple)
> After the suggestion from [~daijy], I am creating a JIRA ticket to create a new UDF STRSPLITTOBAG,
which will return a bag of tuples as suggested above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message