hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devaraj k <devara...@huawei.com>
Subject RE: hadoop streaming and a directory containing large number of .tgz files
Date Tue, 24 Apr 2012 17:37:00 GMT
Hi Sunil,

    Please check HarFileSystem (Hadoop Archive Filesystem), it will be useful to solve your
problem.

Thanks
Devaraj
________________________________________
From: Sunil S Nandihalli [sunil.nandihalli@gmail.com]
Sent: Tuesday, April 24, 2012 7:12 PM
To: common-user@hadoop.apache.org
Subject: hadoop streaming and a directory containing large number of .tgz files

Hi Everybody,
 I am a newbie to hadoop. I have about 40K .tgz files each of approximately
3MB . I would like to process this as if it were a single large file formed
by
"cat list-of-files | gnuparallel 'tar -Oxvf {} | sed 1d' > output.txt"
how can I achieve this using hadoop-streaming or some-other similar
library..


thanks,
Sunil.

Mime
View raw message