pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vincent <vincent.hervi...@gmail.com>
Subject Re: Tuple to lines conversion in Pig
Date Tue, 10 May 2011 10:56:04 GMT
Thanks Mridul for your quick answer!

According to documentation PARALLEL is setting the number of reduce tasks.
So how can I make it taking an UDF instead? Is there any example of such
functions in SVN/pig0.8 package?

Best Regards

Vincent

On Tue, May 10, 2011 at 2:02 PM, Mridul Muralidharan
<mridulm@yahoo-inc.com>wrote:

>
> Easy option would be to write your own udf which can catch corner cases,
> etc  ..
> But assuming your data strictly follows what you mentioned, something like
> this might help (illustrative only !) :
>
> pets = load 'pets.txt'  USING PigStorage(';') AS (pet_id:chararray,
> pet_type:chararray, pet_name:chararray);
>
> people = load 'peoples.txt'  USING PigStorage(';') AS (user:chararray,
> ids:chararray);
> people_t = FOREACH people GENERATE user, STRSPLIT(ids, ',');
> -- STRSPLIT returns a tuple, not a bag : so convert to bag and flatten it.
> people_reqd = FOREACH people_t GENERATE user, FLATTEN(TOBAG($1)) as
> (user_pet_id);
>
>
> reqd_op = JOIN people_reqd BY user_pet_id, pets BY pet_id PARALLEL
> $MY_PARALLEL;
>
>
> reqd_op should contain what you need ...
>
>
>
> Regards,
> Mridul
>
>
>
>
>
> On Tuesday 10 May 2011 03:00 PM, Vincent wrote:
>
>> Hello dear Pig users,
>>
>> *I am loading a file with the following format:*
>>
>> *$ cat peoples.txt
>> tom;1234,4567,6
>> anna;27894*
>> First field is a name, second field is a concatenation of an unknown
>> number
>> of pets ids.
>>
>> *I would like to JOIN this file with another one:*
>>
>> *$ cat pets.txt
>> 1234;dog;cocker
>> 4567;mouse;usa
>> 6;cat;persian
>> 27894;cat;manx
>> *Fields are pet's id, pet's type, pet's race.
>> *
>> to get the following result:*
>>
>> *1234;dog;cocker;tom
>> 4567;mouse;usa;tom
>> 6;cat;persian;tom
>> 27894;cat;manx;anna*
>>
>> *Problem is that I don't know how to convert a tuple of fields to lines,
>> i.e. to put the file peoples.txt into the following intermediate format:*
>> *tom,1234
>> tom,4567
>> tom,6
>> anna,27894*
>>
>> Thanks in advance for your help!
>>
>>
>>     Vincent Hervieux
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message