Return-Path: X-Original-To: apmail-datafu-dev-archive@minotaur.apache.org Delivered-To: apmail-datafu-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6C29317DDB for ; Mon, 29 Sep 2014 00:57:22 +0000 (UTC) Received: (qmail 79747 invoked by uid 500); 29 Sep 2014 00:57:22 -0000 Delivered-To: apmail-datafu-dev-archive@datafu.apache.org Received: (qmail 79702 invoked by uid 500); 29 Sep 2014 00:57:22 -0000 Mailing-List: contact dev-help@datafu.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@datafu.incubator.apache.org Delivered-To: mailing list dev@datafu.incubator.apache.org Received: (qmail 79691 invoked by uid 99); 29 Sep 2014 00:57:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Sep 2014 00:57:21 +0000 X-ASF-Spam-Status: No, hits=-1998.4 required=5.0 tests=ALL_TRUSTED,HTML_MESSAGE,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 29 Sep 2014 00:57:20 +0000 Received: (qmail 79642 invoked by uid 99); 29 Sep 2014 00:57:00 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Sep 2014 00:57:00 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 6815B1D1956; Mon, 29 Sep 2014 00:56:57 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============3364528962537027333==" MIME-Version: 1.0 Subject: Re: Review Request 25564: DATAFU-69: Create ChooseFieldByValue UDF - which, given a field who's value contains a field name, and *, returns the value of the field referenced by the field name From: "Matthew Hayes" To: "Jonathan Coveney" , "Jakob Homan" , "Sam Shah" , "Matthew Hayes" Cc: "DataFu" , "Russell Jurney" Date: Mon, 29 Sep 2014 00:56:57 -0000 Message-ID: <20140929005657.19177.7018@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Matthew Hayes" X-ReviewGroup: DataFu X-ReviewRequest-URL: https://reviews.apache.org/r/25564/ X-Sender: "Matthew Hayes" References: <20140929002052.19041.42378@reviews.apache.org> In-Reply-To: <20140929002052.19041.42378@reviews.apache.org> Reply-To: "Matthew Hayes" X-ReviewRequest-Repository: datafu X-Virus-Checked: Checked by ClamAV on apache.org --===============3364528962537027333== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25564/#review54788 ----------------------------------------------------------- datafu-pig/src/main/java/datafu/pig/util/SelectFieldByName.java Hmm, something just occurred to me. This does not currently provide the output schema. So this is one problem. But, how do we determine the output schema? If the output value is decided dynamically, then it can vary. One way to address this is to require that all the other values of the tuple are of the same type. Then you just take the schema form the first value. In your example they are all chararray. But this does limit the uses of this UDF. - Matthew Hayes On Sept. 29, 2014, 12:20 a.m., Russell Jurney wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/25564/ > ----------------------------------------------------------- > > (Updated Sept. 29, 2014, 12:20 a.m.) > > > Review request for DataFu, Jonathan Coveney, Jakob Homan, Matthew Hayes, and Sam Shah. > > > Repository: datafu > > > Description > ------- > > Example use: > group_fields = LOAD '/e8/smalldata/group_fields.txt' AS (groupField:chararray); > with_group = CROSS group_fields, hour_rounded; > with_group = FOREACH with_group GENERATE group_fields::groupField AS groupField, > hour_rounded::sourceNameOrIp AS sourceNameOrIp, > hour_rounded::destinationNameOrIp AS destinationNameOrIp, > ...; > with_value_substitution = FOREACH with_group GENERATE ChooseFieldByValue(groupField, *) AS groupValue:tuple(value:chararray), *; > with_value_substitution = FOREACH with_value_substitution GENERATE > FLATTEN(groupValue) AS groupValue:chararray, > groupField, > foo, > bar, > ...; > all_success = FOREACH (GROUP with_value_substitution BY (groupField, groupValue, day)) GENERATE > FLATTEN(group) AS (seriesType, groupValue, day), > (int)COUNT_STAR(with_value_substitution) AS connections:int; > > > Diffs > ----- > > datafu-pig/src/main/java/datafu/pig/util/SelectFieldByName.java PRE-CREATION > datafu-pig/src/test/java/datafu/test/pig/util/SelectFieldByNameTest.java PRE-CREATION > > Diff: https://reviews.apache.org/r/25564/diff/ > > > Testing > ------- > > This UDF was used to replace a very inefficient pig script where macros that did many individual GROUP BY's took many minutes to plan. > > Testing: unit tests and used on real data on a cluster. > > > Thanks, > > Russell Jurney > > --===============3364528962537027333==--