pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ayon Sinha <ayonsi...@yahoo.com>
Subject Re: How to store each record in a seperate file
Date Thu, 13 Oct 2011 05:56:57 GMT
Hi Kiranprasad,
What is your usecase? Are you sure you have picked the right tool for the job? Pig/Hadoop
is meant for massive datasets which mean millions and billions of rows. Which in your case
would lead to millions & billions of files which Hadoop doesn't like anyway.
Now if your dataset is really small then do you really need hadoop or perl, python, shell
or any programming language on a single machine would suffice?
Just asking to make sure you are not headed the wrong path.
OTOH, if you are doing this as an academic exercise, all is justified.
 
-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.



________________________________
From: kiranprasad <kiranprasad.g@imimobile.com>
To: user@pig.apache.org; Ayon Sinha <ayonsinha@yahoo.com>
Sent: Wednesday, October 12, 2011 10:19 PM
Subject: Re: How to store each record in a seperate file

Thank you for quick response, But how can I perform the below in local mode.

-----Original Message----- 
From: Jonathan Coveney
Sent: Thursday, October 13, 2011 10:28 AM
To: user@pig.apache.org ; Ayon Sinha
Subject: Re: How to store each record in a seperate file

To Ayon's point, MultipleOutputFormat can get the job done, but keep in mind
that Hadoop deals better with larger files than smaller ones. Every file is
allocated in blocks (64MB, 128MB, 256MB), so lot's of small blocks is bad.

2011/10/12 Ayon Sinha <ayonsinha@yahoo.com>

> Besides the bigger question of Why would you want to store each record in 
> a
> separate file?
> I'm not sure how to do this in Pig but it is definitely possible in Hadoop
> (and also streaming) via MultipleOutputFormat where the name of the output
> file can be based on the base_dir and key and value. You can create your 
> own
> filename based on those arguments.
> http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html
>
> You can definitely implement your own StoreFunc UDF.
> -Ayon
> See My Photos on Flickr
> Also check out my Blog for answers to commonly asked questions.
>
>
>
> ________________________________
> From: kiranprasad <kiranprasad.g@imimobile.com>
> To: user@pig.apache.org
> Sent: Wednesday, October 12, 2011 9:35 PM
> Subject: How to store each record in a seperate file
>
> Hi
>
> After grouping a data set, how do I save each group in a separate file.
>
> ex:
> A = E:/data.txt' USING PigStorage(',');
> B = GROUP A BY $0;
>
> cat data.txt;
>
> (1,2,3)
> (4,2,1)
> (8,3,4)
> (4,3,3)
> (7,2,5)
> (8,4,3)
>
> After grouping
>
> (1,{(1,2,3)})
> (4,{(4,2,1),(4,3,3)})
> (7,{(7,2,5)})
> (8,{(8,3,4),(8,4,3)})
>
> How do I save each record in separate file.
>
>
> Regards
> Kiran.G
> 
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message