Return-Path: X-Original-To: apmail-datafu-dev-archive@minotaur.apache.org Delivered-To: apmail-datafu-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B371317570 for ; Sun, 28 Sep 2014 14:26:52 +0000 (UTC) Received: (qmail 73566 invoked by uid 500); 28 Sep 2014 14:26:52 -0000 Delivered-To: apmail-datafu-dev-archive@datafu.apache.org Received: (qmail 73530 invoked by uid 500); 28 Sep 2014 14:26:52 -0000 Mailing-List: contact dev-help@datafu.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@datafu.incubator.apache.org Delivered-To: mailing list dev@datafu.incubator.apache.org Received: (qmail 73519 invoked by uid 99); 28 Sep 2014 14:26:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Sep 2014 14:26:52 +0000 X-ASF-Spam-Status: No, hits=-1998.4 required=5.0 tests=ALL_TRUSTED,HTML_MESSAGE,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Sun, 28 Sep 2014 14:26:28 +0000 Received: (qmail 73326 invoked by uid 99); 28 Sep 2014 14:26:25 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Sep 2014 14:26:25 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 158011C0114; Sun, 28 Sep 2014 14:26:23 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============0360841001636235715==" MIME-Version: 1.0 Subject: Re: Review Request 25564: DATAFU-69: Create ChooseFieldByValue UDF - which, given a field who's value contains a field name, and *, returns the value of the field referenced by the field name From: "Matthew Hayes" To: "Jonathan Coveney" , "Jakob Homan" , "Sam Shah" , "Matthew Hayes" Cc: "DataFu" , "Russell Jurney" Date: Sun, 28 Sep 2014 14:26:23 -0000 Message-ID: <20140928142623.19041.6135@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Matthew Hayes" X-ReviewGroup: DataFu X-ReviewRequest-URL: https://reviews.apache.org/r/25564/ X-Sender: "Matthew Hayes" References: <20140915185812.7803.49041@reviews.apache.org> In-Reply-To: <20140915185812.7803.49041@reviews.apache.org> Reply-To: "Matthew Hayes" X-ReviewRequest-Repository: datafu X-Virus-Checked: Checked by ClamAV on apache.org --===============0360841001636235715== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25564/#review54778 ----------------------------------------------------------- datafu-pig/src/main/java/datafu/pig/util/ChooseFieldByValue.java Something like this seems more accurate and concise: Selects the value for a field within a tuple using that field's name. datafu-pig/src/main/java/datafu/pig/util/ChooseFieldByValue.java I'm not sure if I like the name ChooseFieldByValue . What about SelectFieldByName? datafu-pig/src/main/java/datafu/pig/util/ChooseFieldByValue.java remove this comment datafu-pig/src/main/java/datafu/pig/util/ChooseFieldByValue.java include message in exception, also something like IllegalArgumentException is probably more appropriate datafu-pig/src/main/java/datafu/pig/util/ChooseFieldByValue.java Should start at i=1 since doesn't make sense to select itself Sorry it took awhile for me to take a look at this. - Matthew Hayes On Sept. 15, 2014, 6:58 p.m., Russell Jurney wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/25564/ > ----------------------------------------------------------- > > (Updated Sept. 15, 2014, 6:58 p.m.) > > > Review request for DataFu, Jonathan Coveney, Jakob Homan, Matthew Hayes, and Sam Shah. > > > Repository: datafu > > > Description > ------- > > Example use: > group_fields = LOAD '/e8/smalldata/group_fields.txt' AS (groupField:chararray); > with_group = CROSS group_fields, hour_rounded; > with_group = FOREACH with_group GENERATE group_fields::groupField AS groupField, > hour_rounded::sourceNameOrIp AS sourceNameOrIp, > hour_rounded::destinationNameOrIp AS destinationNameOrIp, > ...; > with_value_substitution = FOREACH with_group GENERATE ChooseFieldByValue(groupField, *) AS groupValue:tuple(value:chararray), *; > with_value_substitution = FOREACH with_value_substitution GENERATE > FLATTEN(groupValue) AS groupValue:chararray, > groupField, > foo, > bar, > ...; > all_success = FOREACH (GROUP with_value_substitution BY (groupField, groupValue, day)) GENERATE > FLATTEN(group) AS (seriesType, groupValue, day), > (int)COUNT_STAR(with_value_substitution) AS connections:int; > > > Diffs > ----- > > datafu-pig/src/main/java/datafu/pig/util/ChooseFieldByValue.java PRE-CREATION > datafu-pig/src/test/java/datafu/test/pig/util/ChooseFieldByValueTest.java PRE-CREATION > > Diff: https://reviews.apache.org/r/25564/diff/ > > > Testing > ------- > > This UDF was used to replace a very inefficient pig script where macros that did many individual GROUP BY's took many minutes to plan. > > Testing: unit tests and used on real data on a cluster. > > > Thanks, > > Russell Jurney > > --===============0360841001636235715==--