hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10986) hadoop tarball is twice as big as prev. version and 6 times as big unpacked
Date Thu, 21 Aug 2014 13:32:11 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105364#comment-14105364
] 

Alejandro Abdelnur commented on HADOOP-10986:
---------------------------------------------

It seems the culprit for the significant size increase is in the documentation, specifically
protobuf javadocs:

{code}
$ cd hadoop-2.5.0/share/doc/hadoop
$ du -m -s *
55	api
119	common
1	css
1	dependency-analysis.html
1	hadoop-annotations
1	hadoop-archives
1	hadoop-assemblies
2	hadoop-auth
1	hadoop-auth-examples
1	hadoop-common-project
1	hadoop-datajoin
1	hadoop-dist
1	hadoop-distcp
1	hadoop-extras
1	hadoop-gridmix
1	hadoop-hdfs-bkjournal
11	hadoop-hdfs-httpfs
1	hadoop-hdfs-nfs
1	hadoop-hdfs-project
1	hadoop-mapreduce
3	hadoop-mapreduce-client
1	hadoop-mapreduce-examples
1	hadoop-maven-plugins
1	hadoop-minicluster
1	hadoop-minikdc
1	hadoop-nfs
1	hadoop-openstack
1	hadoop-pipes
725	hadoop-project-dist
1	hadoop-rumen
1	hadoop-sls
1	hadoop-streaming
1	hadoop-tools
5	hadoop-yarn
1	hadoop-yarn-project
618	hdfs
1	httpfs
1	images
1	index.html
1	mapreduce
1	project-reports.html
1	yarn
{code}

{code}
$ cd hadoop-2.5.0/share/doc/hadoop/
$ du -m -s hdfs/api/src-html/org/apache/hadoop/hdfs/server/namenode/
222	hdfs/api/src-html/org/apache/hadoop/hdfs/server/namenode/
{code}

Also it seems we have duplicate javadocs dirs:

{code}
$ cd hadoop-2.5.0/share/doc/hadoop/
$ find . -name api -type d
./api
./api/org/apache/hadoop/mapreduce/v2/api
./api/org/apache/hadoop/yarn/api
./api/org/apache/hadoop/yarn/client/api
./api/src-html/org/apache/hadoop/yarn/api
./api/src-html/org/apache/hadoop/yarn/client/api
./common/api
./hadoop-project-dist/hadoop-common/api
./hadoop-project-dist/hadoop-hdfs/api
./hdfs/api
{code}


> hadoop tarball is twice as big as prev. version and 6 times as big unpacked
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-10986
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10986
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.5.0
>            Reporter: André Kelpe
>            Assignee: Karthik Kambatla
>            Priority: Blocker
>
> I noticed that the binary tarball for 2.5.0 is almost 300MB, while 2.4.1 is only 132MB.
Unpacking the latest tarball gives me 1.8 GB of stuff, with the majority in the "share" directory.
>  
> {code}
> $ cd hadoop-2.4.1
> $ du -sh *
> 364K    bin
> 356K    etc
> 100K    include
> 2,3M    lib
> 128K    libexec
> 24K     LICENSE.txt
> 12K     NOTICE.txt
> 12K     README.txt
> 336K    sbin
> 280M    share
> {code}
> {code}
>  $ cd hadoop-2.5.0 
>  $ du -sh *
> 512K    bin
> 332K    etc
> 100K    include
> 4,6M    lib
> 128K    libexec
> 336K    sbin
> 1,8G    share
> {code}
> I also saw some warnings from tar while unpacking:
> {code}
> $ tar xf hadoop-2.5.0.tar.gz 
> tar: Ignoring unknown extended header keyword `SCHILY.dev'
> tar: Ignoring unknown extended header keyword `SCHILY.ino'
> tar: Ignoring unknown extended header keyword `SCHILY.nlink'
> tar: Ignoring unknown extended header keyword `SCHILY.dev'
> tar: Ignoring unknown extended header keyword `SCHILY.ino'
> tar: Ignoring unknown extended header keyword `SCHILY.nlink'
> tar: Ignoring unknown extended header keyword `SCHILY.dev'
> tar: Ignoring unknown extended header keyword `SCHILY.ino'
> tar: Ignoring unknown extended header keyword `SCHILY.nlink'
> tar: Ignoring unknown extended header keyword `SCHILY.dev'
> tar: Ignoring unknown extended header keyword `SCHILY.ino'
> tar: Ignoring unknown extended header keyword `SCHILY.nlink'
> tar: Ignoring unknown extended header keyword `SCHILY.dev'
> tar: Ignoring unknown extended header keyword `SCHILY.ino'
> tar: Ignoring unknown extended header keyword `SCHILY.nlink'
> tar: Ignoring unknown extended header keyword `SCHILY.dev'
> tar: Ignoring unknown extended header keyword `SCHILY.ino'
> tar: Ignoring unknown extended header keyword `SCHILY.nlink'
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message