datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russell Jurney" <>
Subject Re: Review Request 25564: DATAFU-69: Create SelectFieldByName UDF - which, given a field who's value contains a field name, and *, returns the value of the field referenced by the field name
Date Tue, 28 Oct 2014 19:28:01 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated Oct. 28, 2014, 7:28 p.m.)

Review request for DataFu, Jonathan Coveney, Jakob Homan, Matthew Hayes, and Sam Shah.


Updated patch with new name, SelectStringFieldByName

Repository: datafu


Example use:
group_fields = LOAD '/e8/smalldata/group_fields.txt' AS (groupField:chararray); 
with_group = CROSS group_fields, hour_rounded;
with_group = FOREACH with_group GENERATE group_fields::groupField AS groupField, 
hour_rounded::sourceNameOrIp AS sourceNameOrIp,
hour_rounded::destinationNameOrIp AS destinationNameOrIp,
with_value_substitution = FOREACH with_group GENERATE ChooseFieldByValue(groupField, *) AS
groupValue:tuple(value:chararray), *;
with_value_substitution = FOREACH with_value_substitution GENERATE 
FLATTEN(groupValue) AS groupValue:chararray,
all_success = FOREACH (GROUP with_value_substitution BY (groupField, groupValue, day)) GENERATE
FLATTEN(group) AS (seriesType, groupValue, day),
(int)COUNT_STAR(with_value_substitution) AS connections:int;

Diffs (updated)

  datafu-pig/src/main/java/datafu/pig/util/ PRE-CREATION 
  datafu-pig/src/test/java/datafu/test/pig/util/ PRE-CREATION



This UDF was used to replace a very inefficient pig script where macros that did many individual
GROUP BY's took many minutes to plan.

Testing: unit tests and used on real data on a cluster.


Russell Jurney

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message