hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Don Wallwork <don_wallw...@yahoo.com>
Subject bzip2 input decompression not using native library
Date Tue, 07 Oct 2014 19:52:15 GMT
Can someone tell me why native bzip2 de/compression works in hadoop 2.4.1 for 
map output compression, but the java bzip2 implementation is used for input file 
decompression?  Is this expected?

While profiling some hadoop wordcount jobs using a bzip2 compressed input file, it 
looks like bzip2 decompression is using the java implementation rather than the native 
library for input file decompression.  Output from the linux perf tool (see below), shows

that the java bzip2 implementation is used.

     1.83%           java  perf-12473.map      [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
     1.42%           java  perf-11567.map      [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
     1.16%           java  perf-12473.map      [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.getAndMoveToFrontDecode()V
     1.05%           java  perf-12174.map      [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
     0.99%           java  perf-11770.map      [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
     0.98%           java  perf-12826.map      [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
     0.89%           java  perf-12174.map      [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.getAndMoveToFrontDecode()V
     0.79%           java  perf-12739.map      [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I
     0.79%           java  perf-12544.map      [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I

When using the perf tool to check map output compression, it shows that the library version

is correctly used.

This cluster is running Apache Hadoop version 2.4.1 which has been compiled from source 
to include native compression libraries for bzip2 et al on 64 bit ubuntu 12.04.  Checknative

shows that the native compression libraries should be used:

hadoop checknative -a
14/10/07 15:15:57 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2
library system-native
14/10/07 15:15:57 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib
Native library checking:
hadoop: true /usr/local/hadoop-local-build/hadoop-2.4.1/lib/native/libhadoop.so.1.0.0
zlib:   true /lib/x86_64-linux-gnu/libz.so.1
snappy: true /usr/lib/libsnappy.so.1
lz4:    true revision:99
bzip2:  true /lib/x86_64-linux-gnu/libbz2.so.1

I have verified that the io.compression.codec.bzip2.library configuration uses the default



View raw message