datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russell Jurney (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DATAFU-69) Create ChooseFieldByValue UDF - which, given a field who's value contains a field name, and *, returns the value of the field referenced by the field name
Date Fri, 12 Sep 2014 00:15:33 GMT

    [ https://issues.apache.org/jira/browse/DATAFU-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14130913#comment-14130913
] 

Russell Jurney edited comment on DATAFU-69 at 9/12/14 12:15 AM:
----------------------------------------------------------------

Example use:

group_fields = LOAD '/e8/smalldata/group_fields.txt' AS (groupField:chararray); 
with_group = CROSS group_fields, hour_rounded;
with_group = FOREACH with_group GENERATE group_fields::groupField AS groupField, 
                                         hour_rounded::sourceNameOrIp AS sourceNameOrIp,
                                         hour_rounded::destinationNameOrIp AS destinationNameOrIp,
                                        ...;
with_value_substitution = FOREACH with_group GENERATE ChooseFieldByValue(groupField, *) AS
groupValue:tuple(value:chararray), *;
with_value_substitution = FOREACH with_value_substitution GENERATE 
    FLATTEN(groupValue) AS groupValue:chararray,
    groupField,
    foo,
    bar,
    ...;
all_success = FOREACH (GROUP with_value_substitution BY (groupField, groupValue, day)) GENERATE
              FLATTEN(group) AS (seriesType, groupValue, day),
              (int)COUNT_STAR(with_value_substitution) AS connections:int;


was (Author: russell.jurney):
Example use:

group_fields = LOAD '/e8/smalldata/group_fields.txt' AS (groupField:chararray); 
with_group = CROSS group_fields, hour_rounded;
with_group = FOREACH with_group GENERATE group_fields::groupField AS groupField, 
                                         hour_rounded::sourceNameOrIp AS sourceNameOrIp,
                                         hour_rounded::destinationNameOrIp AS destinationNameOrIp,
                                        ...;
with_value_substitution = FOREACH with_group GENERATE ChooseFieldByValue(groupField, *) AS
groupValue:tuple(value:chararray), *;
with_value_substitution = FOREACH with_value_substitution GENERATE 
    FLATTEN(groupValue) AS groupValue:chararray,
    groupField,
    foo,
    bar,
    ...
all_success = FOREACH (GROUP with_value_substitution BY (groupField, groupValue, day)) GENERATE
              FLATTEN(group) AS (seriesType, groupValue, day),
              (int)COUNT_STAR(with_value_substitution) AS connections:int;

> Create ChooseFieldByValue UDF - which, given a field who's value contains a field name,
and *, returns the value of the field referenced by the field name
> ----------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DATAFU-69
>                 URL: https://issues.apache.org/jira/browse/DATAFU-69
>             Project: DataFu
>          Issue Type: Bug
>    Affects Versions: 1.3.0
>            Reporter: Russell Jurney
>              Labels: features
>         Attachments: DATAFU-69.patch
>
>
> define ChooseFieldByValue datafu.pig.util.ChooseFieldByValue();
> data = LOAD 'input' using PigStorage(',') AS (fieldName:chararray, text1:chararray, text2:chararray,
text3:chararray);
> data2 = FOREACH data GENERATE fieldName, ChooseFieldByValue(fieldName,*) as result;
> data3 = FOREACH data2 GENERATE fieldName, result;
> STORE data3 INTO 'output';



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message