hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anu Engineer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDDS-1516) Move ozone-build container definition from dev-support and publish dedicated image
Date Thu, 16 May 2019 15:04:00 GMT

    [ https://issues.apache.org/jira/browse/HDDS-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841434#comment-16841434
] 

Anu Engineer commented on HDDS-1516:
------------------------------------

+1, Please commit at will. Thanks for getting this done.

> Move ozone-build container definition from dev-support and publish dedicated image
> ----------------------------------------------------------------------------------
>
>                 Key: HDDS-1516
>                 URL: https://issues.apache.org/jira/browse/HDDS-1516
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>          Components: build
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>            Priority: Major
>
> hadoop-ozone/dev-support/docker/Dockerfile directory contains a docker container definition
to provide a generic build environment for ozone builds.
> This container (or more preciously the improved version of this container) is used to
run all the build commands inside the container on Jenkins. 
> As of now it's uploaded as elek/ozone-build and works well (all github PR check builds
are executed in this container).
> I propose to move it to the hadoop-docker-ozone repo and publish an apache/ozone-buildenv
docker image from it.
> Note: there are two interesting tricks in the Dockerfile:
> 1.) a lot of users are created (from id=1 to id=4000)
> Reason: the kerberized unit tests require real user. Jenkins uses the same uid inside
the container as outside based on the number ( eg. -u 1000 flat) even if there is no real
user created. And we can't predict what is the uid for the build process (in my jenkins it's
1000(elek) in builds.apache.org it's something between 400 and 500 (as I remember)).
> ./start-build-env.sh follows an approach to create a docker image on-demand (with only
the required user). While it works well, I realized that the image creation is not cached
very well on the jenkins and it may take >10 minutes for each build.
> 2.) The other question is the used maven repository. We prefer to separated the local
maven repositories for parallel builds to avoid any conflict (If one build the the mvn install
the other build may use that jar from the local maven repository). Docker can guarantee a
strong separation but it also means that we need to download about 1GB files for each build
(which is also very time consuming).
> Earlier we started to use an approach to cache all the 3rd party jar files in the docker
image itself.
> As a result we will have a huge buildenv image (1-2G) but the image download is faster.
Docker image can be downloaded as a few huge files and we don't need to download thousands
of jar files one-by-one. The huge docker image also can be cached on the build machine without
any risk.
> With this approach we reduced the 10-20 minutes of the build time to 2-3 minutes. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message