hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From xeon <xeonmailingl...@gmail.com>
Subject How to execute wordcount with compression?
Date Wed, 16 Oct 2013 08:02:08 GMT
Hi,


I want execute the wordcount in yarn with compression enabled with a dir 
with several files, but for that I must compress the input.

dir1/file1.txt
dir1/file2.txt
dir1/file3.txt
dir1/file4.txt
dir1/file5.txt

1 - Should I compress the whole dir or each file in the dir?

2 - Should I use gzip or bzip2?

3 - Do I need to setup any yarn configuration file?

4 - when the job is running, the files are decompressed before running 
the mappers and compressed again after reducers executed?

-- 
Thanks,


Mime
View raw message