hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mix Nin <pig.mi...@gmail.com>
Subject Single Output file from STORE command
Date Fri, 24 May 2013 21:46:48 GMT
PIG STORE command produces multiple output files. I want a single output
file and I tried using command as below

STORE (foreach (group NoNullData all) generate flatten($1))  into 'xxxx';

This command produces one single file but at the same time forces to use
single reducer which kills performance.

How do I overcome the scenario?

Normally   STORE command produces multiple output files, apart from that I
see another file
"_SUCCESS" in output directory. I ma generating metadata file  ( using
PigStorage('\t', '-schema') ) in output directory

I thought of using  getmerge as follows

*hadoop* fs -*getmerge* <dir_of_input_files>   <local file>

But this requires
1)eliminating files other than data files in HDFS directory
2)It creates a single file in local directory but not in HDFS directory
3)I need to again move file from local directory to HDFS directory which
may  take additional time , depending on size of single file
4)I need to agin place the files which I eliminated in Step 1

Is there an efficient way for my problem?


View raw message