Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F098F17C3D for ; Tue, 7 Oct 2014 19:52:50 +0000 (UTC) Received: (qmail 31942 invoked by uid 500); 7 Oct 2014 19:52:46 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 31833 invoked by uid 500); 7 Oct 2014 19:52:46 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 31820 invoked by uid 99); 7 Oct 2014 19:52:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Oct 2014 19:52:45 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of don_wallwork@yahoo.com designates 98.139.213.141 as permitted sender) Received: from [98.139.213.141] (HELO nm23-vm1.bullet.mail.bf1.yahoo.com) (98.139.213.141) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Oct 2014 19:52:17 +0000 Received: from [66.196.81.171] by nm23.bullet.mail.bf1.yahoo.com with NNFMP; 07 Oct 2014 19:52:15 -0000 Received: from [98.139.212.207] by tm17.bullet.mail.bf1.yahoo.com with NNFMP; 07 Oct 2014 19:52:15 -0000 Received: from [127.0.0.1] by omp1016.mail.bf1.yahoo.com with NNFMP; 07 Oct 2014 19:52:15 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 579886.61276.bm@omp1016.mail.bf1.yahoo.com Received: (qmail 82062 invoked by uid 60001); 7 Oct 2014 19:52:15 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1412711535; bh=Hi4Ohw4Kdr8c7nFElEx3XfxEParKPrHBoorq97SibQQ=; h=Message-ID:Date:From:Subject:To:MIME-Version:Content-Type; b=GNFJE4jUOJc0sFrGavMOZ0btJGhm88pRaDEZJXoU8oK7EP7HiPDfB5MfWiYkOyRC1dLmYFeur48vyrt7w1Dlwe9utIF725+ejcCDNCpiboo88wcYdiwWsoQuduLMGxI1N9WKLiajcMnfAhYZEhvd2c0WTzplooZnDlMlecKbSqE= X-YMail-OSG: E3PS5K0VM1me9chxJ6diGuarRBnDNZnvZhLSd.f.uY2kIZ6 YyK7ieUlMmo6AtojoVtTFoyJPtn5l4Q_g6A.vfiflRd_knaG_DoFpwsQ.Gk4 NPA9NR9aHd8B6q9oPmCAXf90nTJzetdCYP9bL4AXwguWW92N5JE1enhOcfWZ t0RjWHDLz9SJ5ZvzKfyVlH_4eGLzZv2YpsRRSWvdPVqNhGDzotge5_Qt.H5s _snWPnexJ4yuWAdaS.zdBAlld51b3d7uQz3fMY0zizNAzMW4tOotgzaQ9FvL 0qYMYjQgGarHSgOTFAOmNlRG4cWFimI2NrIQsquxCh_v0vg2IN5thaSQWDcb gD8gDJGjEzcXv4KBSua03ORFFezhZsARc0Qj.uOzfb35oNNod02Ck538SnPd f1jwe7B.4G9Tk0yFdx_pInr3pXsuD_LcXQ85AdEvCk6rpDwRfkUpT5_TZxfv ItmdTXkvpQ6Rw0qUGkbmBtzM0l1qvvqN7xRpnz1vjHfKsA1m.XCIZqvRJH.E obnMtAB6dA.AMMNNAdVCg5D9wlDsGXHbL4fgoz4HzX32.U2.8xxHitVSMV1G RfnauqbI- Received: from [216.31.219.19] by web142504.mail.bf1.yahoo.com via HTTP; Tue, 07 Oct 2014 12:52:15 PDT X-Rocket-MIMEInfo: 002.001,Q2FuIHNvbWVvbmUgdGVsbCBtZSB3aHkgbmF0aXZlIGJ6aXAyIGRlL2NvbXByZXNzaW9uIHdvcmtzIGluIGhhZG9vcCAyLjQuMSBmb3IgDQptYXAgb3V0cHV0IGNvbXByZXNzaW9uLCBidXQgdGhlIGphdmEgYnppcDIgaW1wbGVtZW50YXRpb24gaXMgdXNlZCBmb3IgaW5wdXQgZmlsZSANCmRlY29tcHJlc3Npb24_ICBJcyB0aGlzIGV4cGVjdGVkPw0KDQpXaGlsZSBwcm9maWxpbmcgc29tZSBoYWRvb3Agd29yZGNvdW50IGpvYnMgdXNpbmcgYSBiemlwMiBjb21wcmVzc2VkIGlucHV0IGZpbGUsIGl0IA0KbG9va3MBMAEBAQE- X-Mailer: YahooMailClassic/799 YahooMailWebService/0.8.203.696 Message-ID: <1412711535.60362.YahooMailBasic@web142504.mail.bf1.yahoo.com> Date: Tue, 7 Oct 2014 12:52:15 -0700 From: Don Wallwork Subject: bzip2 input decompression not using native library To: user@hadoop.apache.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Virus-Checked: Checked by ClamAV on apache.org Can someone tell me why native bzip2 de/compression works in hadoop 2.4.1 for map output compression, but the java bzip2 implementation is used for input file decompression? Is this expected? While profiling some hadoop wordcount jobs using a bzip2 compressed input file, it looks like bzip2 decompression is using the java implementation rather than the native library for input file decompression. Output from the linux perf tool (see below), shows that the java bzip2 implementation is used. 1.83% java perf-12473.map [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I 1.42% java perf-11567.map [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I 1.16% java perf-12473.map [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.getAndMoveToFrontDecode()V 1.05% java perf-12174.map [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I 0.99% java perf-11770.map [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I 0.98% java perf-12826.map [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I 0.89% java perf-12174.map [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.getAndMoveToFrontDecode()V 0.79% java perf-12739.map [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I 0.79% java perf-12544.map [.] Lorg/apache/hadoop/io/compress/bzip2/CBZip2InputStream;.read0()I When using the perf tool to check map output compression, it shows that the library version is correctly used. This cluster is running Apache Hadoop version 2.4.1 which has been compiled from source to include native compression libraries for bzip2 et al on 64 bit ubuntu 12.04. Checknative shows that the native compression libraries should be used: hadoop checknative -a 14/10/07 15:15:57 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native 14/10/07 15:15:57 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library Native library checking: hadoop: true /usr/local/hadoop-local-build/hadoop-2.4.1/lib/native/libhadoop.so.1.0.0 zlib: true /lib/x86_64-linux-gnu/libz.so.1 snappy: true /usr/lib/libsnappy.so.1 lz4: true revision:99 bzip2: true /lib/x86_64-linux-gnu/libbz2.so.1 I have verified that the io.compression.codec.bzip2.library configuration uses the default system-native. Thanks, Don