hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sunil S Nandihalli <sunil.nandiha...@gmail.com>
Subject hadoop streaming and a directory containing large number of .tgz files
Date Tue, 24 Apr 2012 13:42:47 GMT
Hi Everybody,
 I am a newbie to hadoop. I have about 40K .tgz files each of approximately
3MB . I would like to process this as if it were a single large file formed
by
"cat list-of-files | gnuparallel 'tar -Oxvf {} | sed 1d' > output.txt"
how can I achieve this using hadoop-streaming or some-other similar
library..


thanks,
Sunil.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message