datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Hayes" <matthew.terence.ha...@gmail.com>
Subject Re: Review Request 25564: DATAFU-69: Create ChooseFieldByValue UDF - which, given a field who's value contains a field name, and *, returns the value of the field referenced by the field name
Date Sun, 28 Sep 2014 14:26:23 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25564/#review54778
-----------------------------------------------------------



datafu-pig/src/main/java/datafu/pig/util/ChooseFieldByValue.java
<https://reviews.apache.org/r/25564/#comment95054>

    Something like this seems more accurate and concise:
    
    Selects the value for a field within a tuple using that field's name.



datafu-pig/src/main/java/datafu/pig/util/ChooseFieldByValue.java
<https://reviews.apache.org/r/25564/#comment95055>

    I'm not sure if I like the name ChooseFieldByValue .  What about SelectFieldByName?



datafu-pig/src/main/java/datafu/pig/util/ChooseFieldByValue.java
<https://reviews.apache.org/r/25564/#comment95052>

    remove this comment



datafu-pig/src/main/java/datafu/pig/util/ChooseFieldByValue.java
<https://reviews.apache.org/r/25564/#comment95053>

    include message in exception, also something like IllegalArgumentException is probably
more appropriate



datafu-pig/src/main/java/datafu/pig/util/ChooseFieldByValue.java
<https://reviews.apache.org/r/25564/#comment95056>

    Should start at i=1 since doesn't make sense to select itself


Sorry it took awhile for me to take a look at this.

- Matthew Hayes


On Sept. 15, 2014, 6:58 p.m., Russell Jurney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25564/
> -----------------------------------------------------------
> 
> (Updated Sept. 15, 2014, 6:58 p.m.)
> 
> 
> Review request for DataFu, Jonathan Coveney, Jakob Homan, Matthew Hayes, and Sam Shah.
> 
> 
> Repository: datafu
> 
> 
> Description
> -------
> 
> Example use:
> group_fields = LOAD '/e8/smalldata/group_fields.txt' AS (groupField:chararray); 
> with_group = CROSS group_fields, hour_rounded;
> with_group = FOREACH with_group GENERATE group_fields::groupField AS groupField, 
> hour_rounded::sourceNameOrIp AS sourceNameOrIp,
> hour_rounded::destinationNameOrIp AS destinationNameOrIp,
> ...;
> with_value_substitution = FOREACH with_group GENERATE ChooseFieldByValue(groupField,
*) AS groupValue:tuple(value:chararray), *;
> with_value_substitution = FOREACH with_value_substitution GENERATE 
> FLATTEN(groupValue) AS groupValue:chararray,
> groupField,
> foo,
> bar,
> ...;
> all_success = FOREACH (GROUP with_value_substitution BY (groupField, groupValue, day))
GENERATE
> FLATTEN(group) AS (seriesType, groupValue, day),
> (int)COUNT_STAR(with_value_substitution) AS connections:int;
> 
> 
> Diffs
> -----
> 
>   datafu-pig/src/main/java/datafu/pig/util/ChooseFieldByValue.java PRE-CREATION 
>   datafu-pig/src/test/java/datafu/test/pig/util/ChooseFieldByValueTest.java PRE-CREATION

> 
> Diff: https://reviews.apache.org/r/25564/diff/
> 
> 
> Testing
> -------
> 
> This UDF was used to replace a very inefficient pig script where macros that did many
individual GROUP BY's took many minutes to plan.
> 
> Testing: unit tests and used on real data on a cluster.
> 
> 
> Thanks,
> 
> Russell Jurney
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message