Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CC29011186 for ; Thu, 21 Aug 2014 13:32:11 +0000 (UTC) Received: (qmail 80748 invoked by uid 500); 21 Aug 2014 13:32:11 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 80708 invoked by uid 500); 21 Aug 2014 13:32:11 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 80697 invoked by uid 99); 21 Aug 2014 13:32:11 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Aug 2014 13:32:11 +0000 Date: Thu, 21 Aug 2014 13:32:11 +0000 (UTC) From: "Alejandro Abdelnur (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-10986) hadoop tarball is twice as big as prev. version and 6 times as big unpacked MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-10986?page=3Dcom.atlassi= an.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D14= 105364#comment-14105364 ]=20 Alejandro Abdelnur commented on HADOOP-10986: --------------------------------------------- It seems the culprit for the significant size increase is in the documentat= ion, specifically protobuf javadocs: {code} $ cd hadoop-2.5.0/share/doc/hadoop $ du -m -s * 55=09api 119=09common 1=09css 1=09dependency-analysis.html 1=09hadoop-annotations 1=09hadoop-archives 1=09hadoop-assemblies 2=09hadoop-auth 1=09hadoop-auth-examples 1=09hadoop-common-project 1=09hadoop-datajoin 1=09hadoop-dist 1=09hadoop-distcp 1=09hadoop-extras 1=09hadoop-gridmix 1=09hadoop-hdfs-bkjournal 11=09hadoop-hdfs-httpfs 1=09hadoop-hdfs-nfs 1=09hadoop-hdfs-project 1=09hadoop-mapreduce 3=09hadoop-mapreduce-client 1=09hadoop-mapreduce-examples 1=09hadoop-maven-plugins 1=09hadoop-minicluster 1=09hadoop-minikdc 1=09hadoop-nfs 1=09hadoop-openstack 1=09hadoop-pipes 725=09hadoop-project-dist 1=09hadoop-rumen 1=09hadoop-sls 1=09hadoop-streaming 1=09hadoop-tools 5=09hadoop-yarn 1=09hadoop-yarn-project 618=09hdfs 1=09httpfs 1=09images 1=09index.html 1=09mapreduce 1=09project-reports.html 1=09yarn {code} {code} $ cd hadoop-2.5.0/share/doc/hadoop/ $ du -m -s hdfs/api/src-html/org/apache/hadoop/hdfs/server/namenode/ 222=09hdfs/api/src-html/org/apache/hadoop/hdfs/server/namenode/ {code} Also it seems we have duplicate javadocs dirs: {code} $ cd hadoop-2.5.0/share/doc/hadoop/ $ find . -name api -type d ./api ./api/org/apache/hadoop/mapreduce/v2/api ./api/org/apache/hadoop/yarn/api ./api/org/apache/hadoop/yarn/client/api ./api/src-html/org/apache/hadoop/yarn/api ./api/src-html/org/apache/hadoop/yarn/client/api ./common/api ./hadoop-project-dist/hadoop-common/api ./hadoop-project-dist/hadoop-hdfs/api ./hdfs/api {code} > hadoop tarball is twice as big as prev. version and 6 times as big unpack= ed > -------------------------------------------------------------------------= -- > > Key: HADOOP-10986 > URL: https://issues.apache.org/jira/browse/HADOOP-10986 > Project: Hadoop Common > Issue Type: Bug > Affects Versions: 2.5.0 > Reporter: Andr=C3=A9 Kelpe > Assignee: Karthik Kambatla > Priority: Blocker > > I noticed that the binary tarball for 2.5.0 is almost 300MB, while 2.4.1 = is only 132MB. Unpacking the latest tarball gives me 1.8 GB of stuff, with = the majority in the "share" directory. > =20 > {code} > $ cd hadoop-2.4.1 > $ du -sh * > 364K bin > 356K etc > 100K include > 2,3M lib > 128K libexec > 24K LICENSE.txt > 12K NOTICE.txt > 12K README.txt > 336K sbin > 280M share > {code} > {code} > $ cd hadoop-2.5.0=20 > $ du -sh * > 512K bin > 332K etc > 100K include > 4,6M lib > 128K libexec > 336K sbin > 1,8G share > {code} > I also saw some warnings from tar while unpacking: > {code} > $ tar xf hadoop-2.5.0.tar.gz=20 > tar: Ignoring unknown extended header keyword `SCHILY.dev' > tar: Ignoring unknown extended header keyword `SCHILY.ino' > tar: Ignoring unknown extended header keyword `SCHILY.nlink' > tar: Ignoring unknown extended header keyword `SCHILY.dev' > tar: Ignoring unknown extended header keyword `SCHILY.ino' > tar: Ignoring unknown extended header keyword `SCHILY.nlink' > tar: Ignoring unknown extended header keyword `SCHILY.dev' > tar: Ignoring unknown extended header keyword `SCHILY.ino' > tar: Ignoring unknown extended header keyword `SCHILY.nlink' > tar: Ignoring unknown extended header keyword `SCHILY.dev' > tar: Ignoring unknown extended header keyword `SCHILY.ino' > tar: Ignoring unknown extended header keyword `SCHILY.nlink' > tar: Ignoring unknown extended header keyword `SCHILY.dev' > tar: Ignoring unknown extended header keyword `SCHILY.ino' > tar: Ignoring unknown extended header keyword `SCHILY.nlink' > tar: Ignoring unknown extended header keyword `SCHILY.dev' > tar: Ignoring unknown extended header keyword `SCHILY.ino' > tar: Ignoring unknown extended header keyword `SCHILY.nlink' > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)