hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-6671) To use maven for hadoop common builds
Date Mon, 18 Jul 2011 23:50:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067399#comment-13067399
] 

Alejandro Abdelnur commented on HADOOP-6671:
--------------------------------------------

@Eric,

First of all, thanks for volunteering to tackle the Mavenization RPM/DEB.

My initial approach to the patch was heavily based in profiles doing what you are suggesting.
The end result was a very large POM for 'common' with profiles heavily relying in the order
of the plugins to do the right thing (I had to define all the plugins, even if not used, in
the main <build> to ensure the right order of execution when the profiles are active).
The result was a POM difficult to follow and to update (got bitten a few times while improving
it).

My second approach, the current one, it is much cleaner in that regard. It fully leverages
Maven reactor and build times are not affected. Following is a table that shows the time taken
by common build tasks:


|| Build task            || Ant command                                || Maven command  
                                        || Ant Time || Maven time ||
| *clean*                | ant clean                                   |  mvn clean      
                                        |  00:02    |  00:01  *   |
| *clean compile*        | ant clean compile                           |  mvn clean compile
                                      |  00:20    |  00:13  *   |
| *clean test-compile*   | ant clean test-compile                      |  mvn clean test -DskipTests
                             |  00:23    |  00:17  *   |
| *clean 1 test*         | ant clean test -Dtestcase=TestConfiguration |  mvn clean test -Dtest=TestConfiguration
                |  01:09    |  *00:27* *  |
| *<warm> 1 test*        | ant test -Dtestcase=TestConfiguration       |  mvn test -Dtest=TestConfiguration
                      |  00:52    |  *00:11* *  |
| *clean jar test-jar*   | ant clean jar jar-test                      |  mvn clean package
                                      |  00:28    |  00:23  *   |
| *clean binary-tar*     | ant clean binary                            |  mvn clean package
post-site -DskipTests                 |  00:59    |  00:46  *   |
| *clean tar w/docs*     | ant clean tar                               |  mvn clean package
post-site -DskipTests -Pdocs          |  N/A      |  04:10      |
| *clean tar w/docs/src* | mvn clean tar                               |  mvn clean package
post-site -DskipTests -Pdocs -Psource |  01:34 *  |  05:18      |

Of all these, IMO the *most* interesting improvement is running a single test (from scratch
and with pre-compiled classes). This will be a huge improvement for development.

Said this, we could merge {{hadoop-docs}} in {{hadoop-common}}, using the 'site' phase to
wire all documentation generation (I think this wouldn't complicate things too much).

However, for TAR/RPM/DEB I would like to keep a different module which kicks with the assembly
plugin to generate the TAR/RMP/DEB. And there we could have a profiles that build a TAR, a
RMP and/or a DEB.

Another benefit of this is that all scripts and stuff would end up in the TAR/RMP/DEB module,
the {{hadoop-common}} module only produces a JAR file.

The layout would then be:

{code}
trunk/pom.xml
|
|-- hadoop-annotations/pom.xml (javadoc annotations and doclet)
|
|-- hadoop-project/pom.xml (dependency management, extended by all other modules))
|
|-- common/pom.xml
|      |
|      |-- hadoop-common/pom.xml [clean, compile,package,install,deploy,site] (-Pnative)
|      |
|      |-- hadoop-common-distro/pom.xml [clean, assembly:single] (-Ptar -Prpm -Pdeb)
|
|-- hdfs
|
|-- mapreduce
{code}

The [...] are the meaningful lifecycle phases.

The (-P...) are the profiles each module would support.

The only thing we have to sort out is how to wire the maven-antrun-plugin to run after the
'assembly:single' invocation. This is required to be able to create invoke Unix TAR to create
the TAR in order to preserve the symlinks.

Would you be OK with this approach? 

Thoughts?

PS: I'm somehow familiar with Hbase packaging and the current overloading of maven phases
and profiles usages makings things too slow (until not long ago, not sure if still valid,
running 'mvn install' was generating the TAR).


> To use maven for hadoop common builds
> -------------------------------------
>
>                 Key: HADOOP-6671
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6671
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Giridharan Kesavan
>            Assignee: Alejandro Abdelnur
>         Attachments: HADOOP-6671-cross-project-HDFS.patch, HADOOP-6671-e.patch, HADOOP-6671-f.patch,
HADOOP-6671-g.patch, HADOOP-6671-h.patch, HADOOP-6671-i.patch, HADOOP-6671-j.patch, HADOOP-6671-k.sh,
HADOOP-6671-l.patch, HADOOP-6671-m.patch, HADOOP-6671-n.patch, HADOOP-6671-o.patch, HADOOP-6671-p.patch,
HADOOP-6671-q.patch, HADOOP-6671.patch, HADOOP-6671b.patch, HADOOP-6671c.patch, HADOOP-6671d.patch,
build.png, common-mvn-layout-i.sh, hadoop-commons-maven.patch, mvn-layout-e.sh, mvn-layout-f.sh,
mvn-layout-k.sh, mvn-layout-l.sh, mvn-layout-m.sh, mvn-layout-n.sh, mvn-layout-o.sh, mvn-layout-p.sh,
mvn-layout-q.sh, mvn-layout.sh, mvn-layout.sh, mvn-layout2.sh, mvn-layout2.sh
>
>
> We are now able to publish hadoop artifacts to the maven repo successfully [ Hadoop-6382]
> Drawbacks with the current approach:
> * Use ivy for dependency management with ivy.xml
> * Use maven-ant-task for artifact publishing to the maven repository
> * pom files are not generated dynamically 
> To address this I propose we use maven to build hadoop-common, which would help us to
manage dependencies, publish artifacts and have one single xml file(POM) for dependency management
and artifact publishing.
> I would like to have a branch created to work on mavenizing  hadoop common.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message