impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Ho (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-3223: Supports download of CDH components from S3.
Date Wed, 08 Jun 2016 05:52:22 GMT
Michael Ho has posted comments on this change.

Change subject: IMPALA-3223: Supports download of CDH components from S3.
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3333/2/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

Line 288: def download_cdh_components(toolchain_root, cdh_components):
> bootstrap_toolchain can be run every time buildall.sh is run (for clean or 
I share your concern and I also tried to come up with a way to not download the md5sum file
for every incremental build.

The major difference between the CDH components and other pre-existing binaries in the toolchain
directory is that we don't really have a good versioning system for the CDH components. Up
to this point, the way it works is that the integration Jenkins job will check the latest
approved version of the CDH components into our git repos. A git fetch will pick up the latest
version. However, the CDH components will have the same version string so it's hard to tell
if the version cached locally is stale or not. 

It's unclear to me if we really need to get the latest CDH components for our day-to-day development.
May be it's already sufficient to have our Jenkins jobs do that as they always bootstrap the
toolchain from scratch for each run. It would be great if others can chime in on this point.

If we can agree that it's unnecessary to always download the latest version of CDH components,
this function can just skip downloading the component if it exists locally already. On the
other hand, if we want to preserve the existing behavior, we may consider storing a versioning
file in the CDH components directory and download only if we are behind.

Another way to work around the repeated downloading problem is to set SKIP_TOOLCHAIN_BOOTSTRAP
to true. That may be particularly useful in disconnected environment.

With all the above said, I prefer the first option which is to download only if it's missing.


-- 
To view, visit http://gerrit.cloudera.org:8080/3333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message