Ruslan,
Thanks for your reply in advance.

Jobs' statistics are as follows;

case 1 : uncompressed data(none)
12/08/09 16:12:44 INFO mapred.JobClient: Job complete: job_201208021633_0049
12/08/09 16:12:44 INFO mapred.JobClient: Counters: 23
12/08/09 16:12:44 INFO mapred.JobClient:   Job Counters 
12/08/09 16:12:44 INFO mapred.JobClient:     Launched reduce tasks=1
12/08/09 16:12:44 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=3623053
12/08/09 16:12:44 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
12/08/09 16:12:44 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
12/08/09 16:12:44 INFO mapred.JobClient:     Rack-local map tasks=1
12/08/09 16:12:44 INFO mapred.JobClient:     Launched map tasks=166
12/08/09 16:12:44 INFO mapred.JobClient:     Data-local map tasks=165
12/08/09 16:12:44 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=220786
12/08/09 16:12:44 INFO mapred.JobClient:   FileSystemCounters
12/08/09 16:12:44 INFO mapred.JobClient:     FILE_BYTES_READ=1852424288
12/08/09 16:12:44 INFO mapred.JobClient:     HDFS_BYTES_READ=10644581454
12/08/09 16:12:44 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1894096220
12/08/09 16:12:44 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=211440
12/08/09 16:12:44 INFO mapred.JobClient:   Map-Reduce Framework
12/08/09 16:12:44 INFO mapred.JobClient:     Reduce input groups=13661
12/08/09 16:12:44 INFO mapred.JobClient:     Combine output records=69055428
12/08/09 16:12:44 INFO mapred.JobClient:     Map input records=158156100
12/08/09 16:12:44 INFO mapred.JobClient:     Reduce shuffle bytes=33143186
12/08/09 16:12:44 INFO mapred.JobClient:     Reduce output records=13661
12/08/09 16:12:44 INFO mapred.JobClient:     Spilled Records=122916251
12/08/09 16:12:44 INFO mapred.JobClient:     Map output bytes=15704921900
12/08/09 16:12:44 INFO mapred.JobClient:     Combine input records=1332132129
12/08/09 16:12:44 INFO mapred.JobClient:     Map output records=1265248800
12/08/09 16:12:44 INFO mapred.JobClient:     SPLIT_RAW_BYTES=19716
12/08/09 16:12:44 INFO mapred.JobClient:     Reduce input records=2172099

case2 : lzo 
12/08/09 15:58:11 INFO mapred.JobClient: Job complete: job_201208021633_0048
12/08/09 15:58:11 INFO mapred.JobClient: Counters: 23
12/08/09 15:58:11 INFO mapred.JobClient:   Job Counters 
12/08/09 15:58:11 INFO mapred.JobClient:     Launched reduce tasks=1
12/08/09 15:58:11 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=3361287
12/08/09 15:58:11 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
12/08/09 15:58:11 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
12/08/09 15:58:11 INFO mapred.JobClient:     Rack-local map tasks=4
12/08/09 15:58:11 INFO mapred.JobClient:     Launched map tasks=65
12/08/09 15:58:11 INFO mapred.JobClient:     Data-local map tasks=61
12/08/09 15:58:11 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=183529
12/08/09 15:58:11 INFO mapred.JobClient:   FileSystemCounters
12/08/09 15:58:11 INFO mapred.JobClient:     FILE_BYTES_READ=568178351
12/08/09 15:58:11 INFO mapred.JobClient:     HDFS_BYTES_READ=3860287251
12/08/09 15:58:11 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=576095398
12/08/09 15:58:11 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=211440
12/08/09 15:58:11 INFO mapred.JobClient:   Map-Reduce Framework
12/08/09 15:58:11 INFO mapred.JobClient:     Reduce input groups=13661
12/08/09 15:58:11 INFO mapred.JobClient:     Combine output records=66734193
12/08/09 15:58:11 INFO mapred.JobClient:     Map input records=158156100
12/08/09 15:58:11 INFO mapred.JobClient:     Reduce shuffle bytes=4752406
12/08/09 15:58:11 INFO mapred.JobClient:     Reduce output records=13661
12/08/09 15:58:11 INFO mapred.JobClient:     Spilled Records=132612729
12/08/09 15:58:11 INFO mapred.JobClient:     Map output bytes=15704921900
12/08/09 15:58:11 INFO mapred.JobClient:     Combine input records=1331190655
12/08/09 15:58:11 INFO mapred.JobClient:     Map output records=1265248800
12/08/09 15:58:11 INFO mapred.JobClient:     SPLIT_RAW_BYTES=7366
12/08/09 15:58:11 INFO mapred.JobClient:     Reduce input records=792338

case3 : sequence file compressed block-level by snappy

12/09/05 18:33:00 INFO mapred.JobClient: Job complete: job_201209051652_0008

12/09/05 18:33:00 INFO mapred.JobClient: Counters: 23

12/09/05 18:33:00 INFO mapred.JobClient:   Job Counters 

12/09/05 18:33:00 INFO mapred.JobClient:     Launched reduce tasks=1

12/09/05 18:33:00 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=5885897

12/09/05 18:33:00 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0

12/09/05 18:33:00 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

12/09/05 18:33:00 INFO mapred.JobClient:     Rack-local map tasks=2

12/09/05 18:33:00 INFO mapred.JobClient:     Launched map tasks=68

12/09/05 18:33:00 INFO mapred.JobClient:     Data-local map tasks=66

12/09/05 18:33:00 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1320075

12/09/05 18:33:00 INFO mapred.JobClient:   FileSystemCounters

12/09/05 18:33:00 INFO mapred.JobClient:     FILE_BYTES_READ=3706936196

12/09/05 18:33:00 INFO mapred.JobClient:     HDFS_BYTES_READ=4419150507

12/09/05 18:33:00 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=4581439981

12/09/05 18:33:00 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=211440

12/09/05 18:33:00 INFO mapred.JobClient:   Map-Reduce Framework

12/09/05 18:33:00 INFO mapred.JobClient:     Reduce input groups=13661

12/09/05 18:33:00 INFO mapred.JobClient:     Combine output records=0

12/09/05 18:33:00 INFO mapred.JobClient:     Map input records=158156100

12/09/05 18:33:00 INFO mapred.JobClient:     Reduce shuffle bytes=857964933

12/09/05 18:33:00 INFO mapred.JobClient:     Reduce output records=13661

12/09/05 18:33:00 INFO mapred.JobClient:     Spilled Records=6232725043

12/09/05 18:33:00 INFO mapred.JobClient:     Map output bytes=15704921900

12/09/05 18:33:00 INFO mapred.JobClient:     Combine input records=0

12/09/05 18:33:00 INFO mapred.JobClient:     Map output records=1265248800

12/09/05 18:33:00 INFO mapred.JobClient:     SPLIT_RAW_BYTES=8382

12/09/05 18:33:00 INFO mapred.JobClient:     Reduce input records=1265248800

Regards, 
Park

2012/9/7 Ruslan Al-Fakikh <ruslan.al-fakikh@jalent.ru>
Hi,

I would be interesting to see the jobs' statistics (counters).

Thanks

On Fri, Sep 7, 2012 at 3:25 AM, Young-Geun Park
<younggeun.park@gmail.com> wrote:
> Hi, All
>
> I have tested which method is better between Lzo and SequenceFile for a BIG
> file.
>
> File size is 10GiB and WordCount MR is used.
> Inputs of WordCount MR are  lzo which would be indexed by LzoIndexTool(lzo),
> sequence file which is compressed by block level snappy(seq)  , and
> uncompressed original file(none).
>
> Map output  is compressed except of uncompressed file. mapreduce output is
> not compressed for all cases.
>
> The following are wordcount MR running time;
> none       lzo         seq
> 248s      243s     1410s
>
> -Test Environments
>
> OS : CentOS 5.6 (x64) (kernel = 2.6.18)
> # of Core  : 8 (cpu = Intel(R) Xeon(R) CPU E5504  @ 2.00GHz)
> RAM : 18GB
> Java version : 1.6.0_26
> Hadoop version : CDH3U2
> # of datanode(tasktracker) :  8
>
> According to the result, The running time of SequnceFile is much less than
> the others.
> Before testing, I had expected that the results of  both SequenceFile and
> Lzo are about the same.
>
> I want to know why performance of the sequence file compressed by snappy is
> so bad?
>
> do I miss anything in tests?
>
>
> Regards,
> Park
>
>



--
Best Regards,
Ruslan Al-Fakikh