Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 34F7ED006 for ; Mon, 10 Sep 2012 02:37:46 +0000 (UTC) Received: (qmail 54741 invoked by uid 500); 10 Sep 2012 02:37:41 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 54636 invoked by uid 500); 10 Sep 2012 02:37:41 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Delivered-To: moderator for user@hadoop.apache.org Received: (qmail 31735 invoked by uid 99); 10 Sep 2012 02:30:21 -0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of younggeun.park@gmail.com designates 209.85.220.176 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=NLRT9aNWcnoo973HUgkogdz3Rm31ndDSUAXgWPVfwGI=; b=jCVC6ObEyJAwLIjTVYB7x0b/o7X+9fAE6UOcCh20EBCo35sF3dD8x2WIOt6akFwNge L+9DkRmPc9uTw5sQXLpHZkX0sJLE0Q/lsLq4uDxdPk+bf8IkEFrLNg+0EIwnwJVEIPHe KhOKoAQuNQ0CxebLf4njdm6Yg1NJfJNqdSUdvNUlWiJsOMoWJqxl9HdB/rebZpbn4SMh VMlEql9W5Yj/U4uE/ofm53UI46Mb6Z0Kp/lR/N0tXIaB7aiyXZ/fTEG4o8JCg56z3X7x SxZfhGMJF4co+xKevoa8gAptVuH5zpfij0WLS+/CGXq35YJfgMcmH61vyosDNwqKnuwt o2XQ== MIME-Version: 1.0 Date: Mon, 10 Sep 2012 11:29:53 +0900 Message-ID: Subject: Re: Lzo vs SequenceFile for big file From: Young-Geun Park To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=f46d043892f58df7ac04c94fbb67 --f46d043892f58df7ac04c94fbb67 Content-Type: text/plain; charset=ISO-8859-1 Is there anyone who had tested performance of sequence file format and lzo? Regards, Park 2012/9/7 Young-Geun PARK > Ruslan, > Thanks for your reply in advance. > > Jobs' statistics are as follows; > > case 1 : uncompressed data(none) > 12/08/09 16:12:44 INFO mapred.JobClient: Job complete: > job_201208021633_0049 > 12/08/09 16:12:44 INFO mapred.JobClient: Counters: 23 > 12/08/09 16:12:44 INFO mapred.JobClient: Job Counters > 12/08/09 16:12:44 INFO mapred.JobClient: Launched reduce tasks=1 > 12/08/09 16:12:44 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=3623053 > 12/08/09 16:12:44 INFO mapred.JobClient: Total time spent by all > reduces waiting after reserving slots (ms)=0 > 12/08/09 16:12:44 INFO mapred.JobClient: Total time spent by all maps > waiting after reserving slots (ms)=0 > 12/08/09 16:12:44 INFO mapred.JobClient: Rack-local map tasks=1 > 12/08/09 16:12:44 INFO mapred.JobClient: Launched map tasks=166 > 12/08/09 16:12:44 INFO mapred.JobClient: Data-local map tasks=165 > 12/08/09 16:12:44 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=220786 > 12/08/09 16:12:44 INFO mapred.JobClient: FileSystemCounters > 12/08/09 16:12:44 INFO mapred.JobClient: FILE_BYTES_READ=1852424288 > 12/08/09 16:12:44 INFO mapred.JobClient: HDFS_BYTES_READ=10644581454 > 12/08/09 16:12:44 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1894096220 > 12/08/09 16:12:44 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=211440 > 12/08/09 16:12:44 INFO mapred.JobClient: Map-Reduce Framework > 12/08/09 16:12:44 INFO mapred.JobClient: Reduce input groups=13661 > 12/08/09 16:12:44 INFO mapred.JobClient: Combine output > records=69055428 > 12/08/09 16:12:44 INFO mapred.JobClient: Map input records=158156100 > 12/08/09 16:12:44 INFO mapred.JobClient: Reduce shuffle bytes=33143186 > 12/08/09 16:12:44 INFO mapred.JobClient: Reduce output records=13661 > 12/08/09 16:12:44 INFO mapred.JobClient: Spilled Records=122916251 > 12/08/09 16:12:44 INFO mapred.JobClient: Map output bytes=15704921900 > 12/08/09 16:12:44 INFO mapred.JobClient: Combine input > records=1332132129 > 12/08/09 16:12:44 INFO mapred.JobClient: Map output records=1265248800 > 12/08/09 16:12:44 INFO mapred.JobClient: SPLIT_RAW_BYTES=19716 > 12/08/09 16:12:44 INFO mapred.JobClient: Reduce input records=2172099 > > case2 : lzo > 12/08/09 15:58:11 INFO mapred.JobClient: Job complete: > job_201208021633_0048 > 12/08/09 15:58:11 INFO mapred.JobClient: Counters: 23 > 12/08/09 15:58:11 INFO mapred.JobClient: Job Counters > 12/08/09 15:58:11 INFO mapred.JobClient: Launched reduce tasks=1 > 12/08/09 15:58:11 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=3361287 > 12/08/09 15:58:11 INFO mapred.JobClient: Total time spent by all > reduces waiting after reserving slots (ms)=0 > 12/08/09 15:58:11 INFO mapred.JobClient: Total time spent by all maps > waiting after reserving slots (ms)=0 > 12/08/09 15:58:11 INFO mapred.JobClient: Rack-local map tasks=4 > 12/08/09 15:58:11 INFO mapred.JobClient: Launched map tasks=65 > 12/08/09 15:58:11 INFO mapred.JobClient: Data-local map tasks=61 > 12/08/09 15:58:11 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=183529 > 12/08/09 15:58:11 INFO mapred.JobClient: FileSystemCounters > 12/08/09 15:58:11 INFO mapred.JobClient: FILE_BYTES_READ=568178351 > 12/08/09 15:58:11 INFO mapred.JobClient: HDFS_BYTES_READ=3860287251 > 12/08/09 15:58:11 INFO mapred.JobClient: FILE_BYTES_WRITTEN=576095398 > 12/08/09 15:58:11 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=211440 > 12/08/09 15:58:11 INFO mapred.JobClient: Map-Reduce Framework > 12/08/09 15:58:11 INFO mapred.JobClient: Reduce input groups=13661 > 12/08/09 15:58:11 INFO mapred.JobClient: Combine output > records=66734193 > 12/08/09 15:58:11 INFO mapred.JobClient: Map input records=158156100 > 12/08/09 15:58:11 INFO mapred.JobClient: Reduce shuffle bytes=4752406 > 12/08/09 15:58:11 INFO mapred.JobClient: Reduce output records=13661 > 12/08/09 15:58:11 INFO mapred.JobClient: Spilled Records=132612729 > 12/08/09 15:58:11 INFO mapred.JobClient: Map output bytes=15704921900 > 12/08/09 15:58:11 INFO mapred.JobClient: Combine input > records=1331190655 > 12/08/09 15:58:11 INFO mapred.JobClient: Map output records=1265248800 > 12/08/09 15:58:11 INFO mapred.JobClient: SPLIT_RAW_BYTES=7366 > 12/08/09 15:58:11 INFO mapred.JobClient: Reduce input records=792338 > > case3 : sequence file compressed block-level by snappy > > 12/09/05 18:33:00 INFO mapred.JobClient: Job complete: > job_201209051652_0008 > > 12/09/05 18:33:00 INFO mapred.JobClient: Counters: 23 > > 12/09/05 18:33:00 INFO mapred.JobClient: Job Counters > > 12/09/05 18:33:00 INFO mapred.JobClient: Launched reduce tasks=1 > > 12/09/05 18:33:00 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5885897 > > 12/09/05 18:33:00 INFO mapred.JobClient: Total time spent by all > reduces waiting after reserving slots (ms)=0 > > 12/09/05 18:33:00 INFO mapred.JobClient: Total time spent by all maps > waiting after reserving slots (ms)=0 > > 12/09/05 18:33:00 INFO mapred.JobClient: Rack-local map tasks=2 > > 12/09/05 18:33:00 INFO mapred.JobClient: Launched map tasks=68 > > 12/09/05 18:33:00 INFO mapred.JobClient: Data-local map tasks=66 > > 12/09/05 18:33:00 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=1320075 > > 12/09/05 18:33:00 INFO mapred.JobClient: FileSystemCounters > > 12/09/05 18:33:00 INFO mapred.JobClient: FILE_BYTES_READ=3706936196 > > 12/09/05 18:33:00 INFO mapred.JobClient: HDFS_BYTES_READ=4419150507 > > 12/09/05 18:33:00 INFO mapred.JobClient: FILE_BYTES_WRITTEN=4581439981 > > 12/09/05 18:33:00 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=211440 > > 12/09/05 18:33:00 INFO mapred.JobClient: Map-Reduce Framework > > 12/09/05 18:33:00 INFO mapred.JobClient: Reduce input groups=13661 > > 12/09/05 18:33:00 INFO mapred.JobClient: Combine output records=0 > > 12/09/05 18:33:00 INFO mapred.JobClient: Map input records=158156100 > > 12/09/05 18:33:00 INFO mapred.JobClient: Reduce shuffle bytes=857964933 > > 12/09/05 18:33:00 INFO mapred.JobClient: Reduce output records=13661 > > 12/09/05 18:33:00 INFO mapred.JobClient: Spilled Records=6232725043 > > 12/09/05 18:33:00 INFO mapred.JobClient: Map output bytes=15704921900 > > 12/09/05 18:33:00 INFO mapred.JobClient: Combine input records=0 > > 12/09/05 18:33:00 INFO mapred.JobClient: Map output records=1265248800 > > 12/09/05 18:33:00 INFO mapred.JobClient: SPLIT_RAW_BYTES=8382 > > 12/09/05 18:33:00 INFO mapred.JobClient: Reduce input > records=1265248800 > Regards, > Park > > 2012/9/7 Ruslan Al-Fakikh > >> Hi, >> >> I would be interesting to see the jobs' statistics (counters). >> >> Thanks >> >> On Fri, Sep 7, 2012 at 3:25 AM, Young-Geun Park >> wrote: >> > Hi, All >> > >> > I have tested which method is better between Lzo and SequenceFile for a >> BIG >> > file. >> > >> > File size is 10GiB and WordCount MR is used. >> > Inputs of WordCount MR are lzo which would be indexed by >> LzoIndexTool(lzo), >> > sequence file which is compressed by block level snappy(seq) , and >> > uncompressed original file(none). >> > >> > Map output is compressed except of uncompressed file. mapreduce output >> is >> > not compressed for all cases. >> > >> > The following are wordcount MR running time; >> > none lzo seq >> > 248s 243s 1410s >> > >> > -Test Environments >> > >> > OS : CentOS 5.6 (x64) (kernel = 2.6.18) >> > # of Core : 8 (cpu = Intel(R) Xeon(R) CPU E5504 @ 2.00GHz) >> > RAM : 18GB >> > Java version : 1.6.0_26 >> > Hadoop version : CDH3U2 >> > # of datanode(tasktracker) : 8 >> > >> > According to the result, The running time of SequnceFile is much less >> than >> > the others. >> > Before testing, I had expected that the results of both SequenceFile >> and >> > Lzo are about the same. >> > >> > I want to know why performance of the sequence file compressed by >> snappy is >> > so bad? >> > >> > do I miss anything in tests? >> > >> > >> > Regards, >> > Park >> > >> > >> >> >> >> -- >> Best Regards, >> Ruslan Al-Fakikh >> > > --f46d043892f58df7ac04c94fbb67 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

Is there anyone who had tested performance of sequence file = format and lzo?

Regards,=A0
Park

2012/9/7 Young-Geun PARK=A0<younggeun.park@gmail.com>
Ruslan,
Thanks for your reply in advance.
Jobs' statistics are as follows;

ca= se 1 : uncompressed data(none)
12/08/09 16:12= :44 INFO mapred.JobClient: Job complete: job_201208021633_0049
12/08/09 16:12:44 INFO mapred.JobClient: Counters: 23
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0 Job Coun= ters=A0
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0=A0=A0 La= unched reduce tasks=3D1
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0=A0=A0 SL= OTS_MILLIS_MAPS=3D3623053
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0=A0=A0 To= tal time spent by all reduces waiting after reserving slots (ms)=3D0=
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0=A0=A0 To= tal time spent by all maps waiting after reserving slots (ms)=3D0 12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0=A0=A0 Ra= ck-local map tasks=3D1
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0=A0=A0 La= unched map tasks=3D166
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0=A0=A0 Da= ta-local map tasks=3D165
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0=A0=A0 SL= OTS_MILLIS_REDUCES=3D220786
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0 FileSyst= emCounters
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0=A0=A0 FI= LE_BYTES_READ=3D1852424288
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0=A0=A0 HD= FS_BYTES_READ=3D10644581454
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0=A0=A0 FI= LE_BYTES_WRITTEN=3D1894096220
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0=A0=A0 HD= FS_BYTES_WRITTEN=3D211440
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0 Map-Redu= ce Framework
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0=A0=A0 Re= duce input groups=3D13661
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0=A0=A0 Co= mbine output records=3D69055428
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0=A0=A0 Ma= p input records=3D158156100
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0=A0=A0 Re= duce shuffle bytes=3D33143186
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0=A0=A0 Re= duce output records=3D13661
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0=A0=A0 Sp= illed Records=3D122916251
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0=A0=A0 Ma= p output bytes=3D15704921900
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0=A0=A0 Co= mbine input records=3D1332132129
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0=A0=A0 Ma= p output records=3D1265248800
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0=A0=A0 SP= LIT_RAW_BYTES=3D19716
12/08/09 16:12:44 INFO mapred.JobClient:=A0=A0=A0=A0 Re= duce input records=3D2172099

case2 : lzo=A0=
12/08/09 15:58:11 INFO mapred.JobClient: Job= complete: job_201208021633_0048
12/08/09 15:58:11 INFO mapred.JobClient: Counters: 23
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0 Job Coun= ters=A0
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0=A0=A0 La= unched reduce tasks=3D1
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0=A0=A0 SL= OTS_MILLIS_MAPS=3D3361287
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0=A0=A0 To= tal time spent by all reduces waiting after reserving slots (ms)=3D0=
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0=A0=A0 To= tal time spent by all maps waiting after reserving slots (ms)=3D0 12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0=A0=A0 Ra= ck-local map tasks=3D4
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0=A0=A0 La= unched map tasks=3D65
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0=A0=A0 Da= ta-local map tasks=3D61
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0=A0=A0 SL= OTS_MILLIS_REDUCES=3D183529
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0 FileSyst= emCounters
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0=A0=A0 FI= LE_BYTES_READ=3D568178351
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0=A0=A0 HD= FS_BYTES_READ=3D3860287251
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0=A0=A0 FI= LE_BYTES_WRITTEN=3D576095398
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0=A0=A0 HD= FS_BYTES_WRITTEN=3D211440
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0 Map-Redu= ce Framework
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0=A0=A0 Re= duce input groups=3D13661
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0=A0=A0 Co= mbine output records=3D66734193
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0=A0=A0 Ma= p input records=3D158156100
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0=A0=A0 Re= duce shuffle bytes=3D4752406
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0=A0=A0 Re= duce output records=3D13661
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0=A0=A0 Sp= illed Records=3D132612729
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0=A0=A0 Ma= p output bytes=3D15704921900
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0=A0=A0 Co= mbine input records=3D1331190655
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0=A0=A0 Ma= p output records=3D1265248800
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0=A0=A0 SP= LIT_RAW_BYTES=3D7366
12/08/09 15:58:11 INFO mapred.JobClient:=A0=A0=A0=A0 Re= duce input records=3D792338

case3 : sequence file compre= ssed=A0block-level=A0by snappy

12/09/05 18:33:00 INFO mapred.JobClient: Job complete: job_20120905= 1652_0008

12/09/05 18:33:00 INFO mapred.JobClient: Counters: 23

12/09/05 18:33:00 INFO mapred.JobClient: =A0 Job Counters=A0

12/09/0= 5 18:33:00 INFO mapred.JobClient: =A0 =A0 Launched reduce tasks=3D1

12/09/05 18:33:00 INFO mapred.JobClient: =A0 =A0 SLOTS_MILLIS_MAPS=3D588= 5897

12/09/05 18:33:00 INFO mapred.JobClient: =A0 =A0 Total time spen= t by all reduces waiting after reserving slots (ms)=3D0

12/09/05 18:3= 3:00 INFO mapred.JobClient: =A0 =A0 Total time spent by all maps waiting af= ter reserving slots (ms)=3D0

12/09/05 18:33:00 INFO mapred.JobClient: =A0 =A0 Rack-local map tasks=3D= 2

12/09/05 18:33:00 INFO mapred.JobClient: =A0 =A0 Launched map tasks= =3D68

12/09/05 18:33:00 INFO mapred.JobClient: =A0 =A0 Data-local map= tasks=3D66

12/09/05 18:33:00 INFO mapred.JobClient: =A0 =A0 SLOTS_MILLIS_REDUCES=3D= 1320075

12/09/05 18:33:00 INFO mapred.JobClient: =A0 FileSystemCounte= rs

12/09/05 18:33:00 INFO mapred.JobClient: =A0 =A0 FILE_BYTES_READ= =3D3706936196

12/09/05 18:33:00 INFO mapred.JobClient: =A0 =A0 HDFS_BYTES_READ=3D44191= 50507

12/09/05 18:33:00 INFO mapred.JobClient: =A0 =A0 FILE_BYTES_WRI= TTEN=3D4581439981

12/09/05 18:33:00 INFO mapred.JobClient: =A0 =A0 HD= FS_BYTES_WRITTEN=3D211440

12/09/05 18:33:00 INFO mapred.JobClient: =A0 Map-Reduce Framework

= 12/09/05 18:33:00 INFO mapred.JobClient: =A0 =A0 Reduce input groups=3D1366= 1

12/09/05 18:33:00 INFO mapred.JobClient: =A0 =A0 Combine output rec= ords=3D0

12/09/05 18:33:00 INFO mapred.JobClient: =A0 =A0 Map input records=3D158= 156100

12/09/05 18:33:00 INFO mapred.JobClient: =A0 =A0 Reduce shuffl= e bytes=3D857964933

12/09/05 18:33:00 INFO mapred.JobClient: =A0 =A0 = Reduce output records=3D13661

12/09/05 18:33:00 INFO mapred.JobClient: =A0 =A0 Spilled Records=3D62327= 25043

12/09/05 18:33:00 INFO mapred.JobClient: =A0 =A0 Map output byt= es=3D15704921900

12/09/05 18:33:00 INFO mapred.JobClient: =A0 =A0 Com= bine input records=3D0

12/09/05 18:33:00 INFO mapred.JobClient: =A0 =A0 Map output records=3D12= 65248800

12/09/05 18:33:00 INFO mapred.JobClient: =A0 =A0 SPLIT_RAW_B= YTES=3D8382

12/09/05 18:33:00 INFO mapred.JobClient: =A0 =A0 Reduce i= nput records=3D1265248800

Regards,=A0
Park

2012/9/7 Ruslan Al-Fakikh=A0<ruslan.al-fakikh@jalent.ru>
Hi,

I would be interesting to see the jobs' sta= tistics (counters).

Thanks

On Fri, Sep 7, 2012 at 3:25 AM, Young-Geun Park
&= lt;younggeun.= park@gmail.com> wrote:
> Hi, All
>
> I have tested= which method is better between Lzo and SequenceFile for a BIG
> file.
>
> File size is 10GiB and WordCount MR is used.
= > Inputs of WordCount MR are =A0lzo which would be indexed by LzoIndexTo= ol(lzo),
> sequence file which is compressed by block level snappy(se= q) =A0, and
> uncompressed original file(none).
>
> Map output =A0is com= pressed except of uncompressed file. mapreduce output is
> not compre= ssed for all cases.
>
> The following are wordcount MR running = time;
> none =A0 =A0 =A0 lzo =A0 =A0 =A0 =A0 seq
> 248s =A0 =A0 =A0243s = =A0 =A0 1410s
>
> -Test Environments
>
> OS : CentO= S 5.6 (x64) (kernel =3D 2.6.18)
> # of Core =A0: 8 (cpu =3D Intel(R) = Xeon(R) CPU E5504 =A0@ 2.00GHz)
> RAM : 18GB
> Java version : 1.6.0_26
> Hadoop version : CD= H3U2
> # of datanode(tasktracker) : =A08
>
> According to= the result, The running time of SequnceFile is much less than
> the = others.
> Before testing, I had expected that the results of =A0both SequenceFil= e and
> Lzo are about the same.
>
> I want to know why pe= rformance of the sequence file compressed by snappy is
> so bad?
>
> do I miss anything in tests?
>
>
> Regards,<= br>> Park
>
>



-= -
Best Regards,
Ruslan Al-Fakikh

--f46d043892f58df7ac04c94fbb67--