pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiaomeng Wan <shawn...@gmail.com>
Subject Re: how to operate a map type
Date Thu, 02 Jun 2011 15:55:35 GMT
can't you udf return a bag of tuple with two fields (ie key and
value), then flatten it?

Shawn

On Thu, Jun 2, 2011 at 7:28 AM, Jameson Li <hovlj.ei@gmail.com> wrote:
> Hi,
>
> my pig code is like this:
> register myudf.jar
> a = load 'testurls' as (info:chararray);
> b = foreach a generate info,com.company.pig.GetInfoScore($0) as m;
> dump b;
>
> The output is like this:
> (65RFPRO800863GPT,[108#0.2])
> (6JL6U6EA00863J0J,[352#0.5,25#0.15,108#0.07,26#0.06,4#0.16])
> (6B7FF3E300052E97,[25#0.28,405#0.05,4#0.05])
> (5498267_31,[108#0.05,25#0.19,12#0.19])
>
> And I want to group by the map key, and count the info, just like the below
> output:
> 108  3        /*65RFPRO800863GPT   6JL6U6EA00863J0J   5498267_31 */
> 352  1        /*6JL6U6EA00863J0J*/
> 25    3        /*6JL6U6EA00863J0J  6B7FF3E300052E97 5498267_31 */
> 26    1        /*6JL6U6EA00863J0J*/
> 4      2        /*6JL6U6EA00863J0J   6B7FF3E300052E97*/
> 405   1       /*6B7FF3E300052E97*/
> 12     1       /*5498267_31*/
>
> I have a think that I have to split the map to many rows just as the below:
> (65RFPRO800863GPT, 108, 0.2)
> (6JL6U6EA00863J0J, 352, 0.5)
> (6JL6U6EA00863J0J, 25, 0.15)
> (6JL6U6EA00863J0J, 108, 0.07)
> (6JL6U6EA00863J0J, 26, 0.06)
> (6JL6U6EA00863J0J, 4, 0.16)
> (6B7FF3E300052E97, 25, 0.28)
> (6JL6U6EA00863J0J, 405, 0.05)
> (6JL6U6EA00863J0J, 4, 0.05)
> (5498267_31, 108, 0.05)
> (6JL6U6EA00863J0J, 25, 0.19)
> (6JL6U6EA00863J0J, 12, 0.19)
>
> And then it is easy to group and count.
> Am I right?
> I have no idea how to split the map to many rows as the above show.
> Help.
>
> Thanks.
>
> 2011/5/25 Alan Gates <gates@yahoo-inc.com>
>
>> Can't you mimic dynamic key support with static keys by making your map
>> have two static keys 'key' and 'value'?
>>
>> Alan.
>>
>>
>> On May 24, 2011, at 3:05 AM, Jameson Li wrote:
>>
>>  OK.OK.I know that just write UDFs.
>>> I have to write UDFs, and see you......
>>> And I still think there should be grammar support for map operation both
>>> static key and dynamic key.............
>>>
>>> Thanks.
>>>
>>> 2011/5/24 Daniel Dai <daijy@earthlink.net>
>>>
>>>  GetKey(m) already get the key, so you can filter the key. For value, you
>>>> may need to put into UDF.
>>>>
>>>> Grammar support for map is based on static key, eg: m#'key1'. Your use
>>>> case
>>>> is mostly dealing dynamic keys, which you may rely on yourself currently.
>>>>
>>>> Daniel
>>>>
>>>> -----Original Message----- From: Jameson Li
>>>> Sent: Monday, May 23, 2011 7:07 PM
>>>> To: Daniel Dai
>>>> Cc: user@pig.apache.org
>>>> Subject: Re: how to operate a map type
>>>>
>>>>
>>>> And how to filter a map key or a map value? And also only UDF?
>>>>
>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
>>>> c = filter b by m.key == 'aaa' or m.value> 0.2;
>>>>
>>>> How could I write the code?
>>>> Any other way without writing UDF?
>>>>
>>>> And I have a doubt since only writing UDF can operate a map type, why not
>>>> have the official functions about the map type?
>>>>
>>>> Thanks.
>>>>
>>>> 2011/5/24 Daniel Dai <jianyong@yahoo-inc.com>
>>>>
>>>> I cannot think of a way without writing UDF. You can write two UDF:
>>>>
>>>>> * GetKey, input a map, output the key of the map
>>>>> * GetValues, input a bag of map, output a bag of map values
>>>>>
>>>>> The script is like:
>>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as
m;
>>>>> c = foreach b generate GetKey(m) as key, m;
>>>>> d = group c by key;
>>>>> e = foreach c generate group, SUM(GetValues(c.m));
>>>>>
>>>>>
>>>>> Daniel
>>>>>
>>>>>
>>>>> On 05/23/2011 07:06 AM, Jameson Li wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>>>
>>>>>> I have the below pig code:
>>>>>>
>>>>>> register /home/uu/project/lib/pigudfs.jar
>>>>>> ruls = load 'testurl' as (url:chararray);
>>>>>>
>>>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1);
>>>>>>
>>>>>> here when dump b, it will return:
>>>>>> ([4#0.1677963])
>>>>>> ([193#0.16985779,81#0.10994483])
>>>>>> ([418#0.14138427,9#0.1107544,282#0.18699136])
>>>>>>
>>>>>> I just want group by the map key and sum the map value just like:
>>>>>> c = group b by $0#key;
>>>>>> d = foreach c generate group,SUM(b.$0#value);
>>>>>>
>>>>>> How could I write the code?
>>>>>>
>>>>>> Thanks,
>>>>>> Jameson Li.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>
>

Mime
View raw message