Return-Path: Delivered-To: apmail-hadoop-common-dev-archive@www.apache.org Received: (qmail 8049 invoked from network); 9 Jun 2010 14:25:50 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Jun 2010 14:25:50 -0000 Received: (qmail 6811 invoked by uid 500); 9 Jun 2010 14:25:49 -0000 Delivered-To: apmail-hadoop-common-dev-archive@hadoop.apache.org Received: (qmail 6637 invoked by uid 500); 9 Jun 2010 14:25:49 -0000 Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-dev@hadoop.apache.org Received: (qmail 6629 invoked by uid 99); 9 Jun 2010 14:25:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Jun 2010 14:25:49 +0000 X-ASF-Spam-Status: No, hits=4.4 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of carp84@gmail.com designates 74.125.83.176 as permitted sender) Received: from [74.125.83.176] (HELO mail-pv0-f176.google.com) (74.125.83.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Jun 2010 14:25:44 +0000 Received: by pvg4 with SMTP id 4so971272pvg.35 for ; Wed, 09 Jun 2010 07:25:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=GNcsGKzpczJSyhlIVzDiwwR/MkyqyTGGLRkiMUluD0w=; b=gV27jWQdeJPvf1jFOONIYbN2mmhQO+/vXakcFkQ4p5fH6CkvBgfEdfGezjfCjPnHic B9UKxWHF4Eg4OnevL5oR3RhmxBKY+uJvrFKxBlCEBZ7QtO0Sg/VA6wKZuz6QDGGBYime YjkBPlk4uNpY6Ijsm9TU9uFUzWgpYDj+noMUA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=EAlKGZsw2OpA54d+uY1RaDVcWJMQJq1xXE6QhwT3Ph7EN++N964P2s+GwxWgUfMe17 lKAllom8KTtOjbmM2Of+jFLtIsGOikwR2QlfZzkX8TYWa+hX6oL2T1M9ZItz+09t5BxU FJif2GEsu0EF/vkqRO/Hm0ApcQQZu32uVKaNA= MIME-Version: 1.0 Received: by 10.114.187.30 with SMTP id k30mr14217637waf.187.1276093523803; Wed, 09 Jun 2010 07:25:23 -0700 (PDT) Received: by 10.115.92.2 with HTTP; Wed, 9 Jun 2010 07:25:23 -0700 (PDT) In-Reply-To: References: Date: Wed, 9 Jun 2010 22:25:23 +0800 Message-ID: Subject: Re: Problem found while using LZO compression in Hadoop 0.20.1 From: =?GB2312?B?wO7u2g==?= To: common-dev@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016e64cd7442fbe0b048899adbc --0016e64cd7442fbe0b048899adbc Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: quoted-printable Hi Todd, Thanks for your reply. I got the LZO libraries exactly from the same link o= n github, and build it successfully. So this is not the cause, I think. Hi Guys, Any other comments? Thanks. Best Regards, Carp 2010/6/9 Todd Lipcon > Hi, > > Where did you get the LZO libraries? The ones on Google Code are broken, > please use the ones on github: > > http://github.com/toddlipcon/hadoop-lzo > > Thanks > -Todd > > > On Wed, Jun 9, 2010 at 2:59 AM, =C0=EE=EE=DA wrote: > > > Hi, > > > > While using LZO compression to try to improve performance of my cluster= , > I > > found that compression didn't work. The job I run is > > "org.apache.hadoop.examples.Sort", with the input data generated by > > "org.apache.hadoop.examples.RandomWriter". > > I've made sure that I configured lzo native library/jar files right and > set > > all compression related parameters (such as "mapred.compress.map.output= ", > > "mapred.output.compression.type", "mapred.output.compression.codec", > > "mapred.output.compress" and "map.output.compression.codec"), and the > > tasktracker did compress the map/job output through infomation got from > job > > logs. But the output file is not compressed at all! > > Then I searched the internet, and found from > > http://wiki.apache.org/hadoop/SequenceFile that in *SequenceFile Common > > Header*, there're two bytes decided whether compression and block > > compression tuned on for the file. I checked the sequece file generated > by > > RandomWriter, and the result is as follows: > > > > [hdpadmin@shihc008 rand-10mb]$ od -c part-00000 | head -n 15 > > 0000000 S E Q 006 " o r g . a p a c h e . > > 0000020 h a d o o p . i o . B y t e s W > > 0000040 r i t a b l e " o r g . a p a c > > 0000060 h e . h a d o o p . i o . B y t > > 0000100 e s W r i t a b l e *\0 \0* \0 \0 \0 = \0 > > 0000120 244 n ! 177 L 316 030 q g 035 351 L ; 024 216 031 > > 0000140 \0 \0 \t 234 \0 \0 001 305 \0 \0 001 301 207 v 5 255 > > 0000160 220 ] 236 < \b 367 & 9 241 \b v 303 m 314 203 220 > > 0000200 335 \0 241 325 232 035 037 267 303 360 \n 025 u P 003 220 > > 0000220 ^ 235 247 036 S 265 271 035 S 247 O 5 337 + 020 q > > 0000240 277 - 003 212 . 230 221 G 241 5 K K 031 273 036 206 > > 0000260 ( 317 303 367 351 214 364 262 340 S 211 230 \r 362 % 335 > > 0000300 } H w & 234 S F 324 321 274 F 377 [ 344 [ h > > 0000320 204 001 265 ] 037 _ r , 020 370 246 327 231 017 205 252 > > 0000340 273 016 310 w 361 326 032 332 200 Y \a X 342 \r 016 364 > > > > I found the marked two bytes are set to zero, which meant tune off the > > compression. And since the value of these two bytes are '\0', I guess > this > > may be a defect that we ignored to set these two bytes and this > > makes sequece file generated by RandomWriter cannot be compressed. And= I > > don't know whether this appears in other place. > > > > Is my opinion right? If not, does anybody know what causes the > compression > > not working? Looking forward to your reply! > > > > Thanks and Best Regards, > > Carp > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera > --0016e64cd7442fbe0b048899adbc--