hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 李钰 <car...@gmail.com>
Subject Re: Problem found while using LZO compression in Hadoop 0.20.1
Date Wed, 09 Jun 2010 14:25:23 GMT
Hi Todd,

Thanks for your reply. I got the LZO libraries exactly from the same link on
github, and build it successfully. So this is not the cause, I think.

Hi Guys,

Any other comments? Thanks.

Best Regards,
Carp
2010/6/9 Todd Lipcon <todd@cloudera.com>

> Hi,
>
> Where did you get the LZO libraries? The ones on Google Code are broken,
> please use the ones on github:
>
> http://github.com/toddlipcon/hadoop-lzo
>
> Thanks
> -Todd
>
>
> On Wed, Jun 9, 2010 at 2:59 AM, 李钰 <carp84@gmail.com> wrote:
>
> > Hi,
> >
> > While using LZO compression to try to improve performance of my cluster,
> I
> > found that compression didn't work. The job I run is
> > "org.apache.hadoop.examples.Sort", with the input data generated by
> > "org.apache.hadoop.examples.RandomWriter".
> > I've made sure that I configured lzo native library/jar files right and
> set
> > all compression related parameters (such as "mapred.compress.map.output",
> > "mapred.output.compression.type", "mapred.output.compression.codec",
> > "mapred.output.compress" and "map.output.compression.codec"), and the
> > tasktracker did compress the map/job output through infomation got from
> job
> > logs. But the output file is not compressed at all!
> > Then I searched the internet, and found from
> > http://wiki.apache.org/hadoop/SequenceFile that in *SequenceFile Common
> > Header*, there're two bytes decided whether compression and block
> > compression tuned on for the file. I checked the sequece file generated
> by
> > RandomWriter, and the result is as follows:
> >
> > [hdpadmin@shihc008 rand-10mb]$ od -c part-00000 | head -n 15
> > 0000000   S   E   Q 006   "   o   r   g   .   a   p   a   c   h   e   .
> > 0000020   h   a   d   o   o   p   .   i   o   .   B   y   t   e   s   W
> > 0000040   r   i   t   a   b   l   e   "   o   r   g   .   a   p   a   c
> > 0000060   h   e   .   h   a   d   o   o   p   .   i   o   .   B   y   t
> > 0000100   e   s   W   r   i   t   a   b   l   e  *\0  \0*  \0  \0  \0  \0
> > 0000120 244   n   ! 177   L 316 030   q   g 035 351   L   ; 024 216 031
> > 0000140  \0  \0  \t 234  \0  \0 001 305  \0  \0 001 301 207   v   5 255
> > 0000160 220   ] 236   <  \b 367   &   9 241  \b   v 303   m 314 203 220
> > 0000200 335  \0 241 325 232 035 037 267 303 360  \n 025   u   P 003 220
> > 0000220   ^ 235 247 036   S 265 271 035   S 247   O   5 337   + 020   q
> > 0000240 277   - 003 212   . 230 221   G 241   5   K   K 031 273 036 206
> > 0000260   ( 317 303 367 351 214 364 262 340   S 211 230  \r 362   % 335
> > 0000300   }   H   w   & 234   S   F 324 321 274   F 377   [ 344   [   h
> > 0000320 204 001 265   ] 037   _   r   , 020 370 246 327 231 017 205 252
> > 0000340 273 016 310   w 361 326 032 332 200   Y  \a   X 342  \r 016 364
> >
> > I found the marked two bytes are set to zero, which meant tune off the
> > compression. And since the value of these two bytes are '\0', I guess
> this
> > may be a defect that we ignored to set these two bytes and this
> > makes sequece file generated by RandomWriter cannot be compressed.  And I
> > don't know whether this appears in other place.
> >
> > Is my opinion right? If not, does anybody know what causes the
> compression
> > not working? Looking forward to your reply!
> >
> > Thanks and Best Regards,
> > Carp
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message