impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Kornacker <mar...@cloudera.com>
Subject Re: Toolchain - versioning dependencies with the same version number
Date Tue, 28 Feb 2017 20:57:06 GMT
Yes, I too am particularly concerned about maintaining the ability to
build offline, and downloading the same things over and over again.

I don't quite understand the case against versioning - if gc'ing
obsolete versions in order to reduce storage space is a concern, then
it's probably fine to a) blow away and re-download everything, or b)
throw away old versions manually, if you happen to be in a situation
where a) isn't possible.

On Tue, Feb 28, 2017 at 12:20 PM, Tim Armstrong <tarmstrong@cloudera.com> wrote:
> I agree it's not too bad if you have a fat pipe to S3, but it's a pretty
> bad regression in usability to make it the default and particularly provide
> no way to opt out.
>
> The toolchain is almost 1GB though, which is pretty problematic to download
> if a developer is on coffee-shop wifi, cellular wireless, airplane wifi,
> etc. It'd also be pretty easy for a developer working offline to switch
> branches, run buildall.sh, have gcc, etc, automatically deleted and then be
> stuck unable to build anything.
>
>
> On Tue, Feb 28, 2017 at 9:07 AM, Henry Robinson <henry@apache.org> wrote:
>
>> I'd prefer not to do that because it's something of a hack and generates
>> too many artifacts if we make incremental build changes, not to mention the
>> extra complexity required to make such a change because new tarballs might
>> need to be uploaded.
>>
>>
>>
>>
>> On Tue, Feb 28, 2017 at 8:55 AM Lars Volker <lv@cloudera.com> wrote:
>>
>> > Can we add another version string component like -1 or -impala1, or add a
>> > dummy patch to the affected packages to allow for new versions with the
>> > same upstream version? I think this is what Linux distributions commonly
>> do
>> > to have several versions of the same upstream version.
>> >
>> > On Feb 27, 2017 21:15, "Henry Robinson" <henry@cloudera.com> wrote:
>> >
>> > Yes, it would force re-downloading. At my office, downloading a toolchain
>> > takes a matter of a few seconds, so I'm not sure the cost is that great.
>> > And if it turned out to be problematic, one could always change the
>> > toolchain directory for different branches. Having something locally that
>> > set IMPALA_TOOLCHAIN_DIR=${IMPALA_HOME}/${IMPALA_TOOLCHAIN_BUILD_ID}/
>> would
>> > work.
>> >
>> > However I wouldn't want to force behaviour that into the toolchain
>> scripts
>> > because of the need for garbage collection it would raise - it wouldn't
>> be
>> > clear when to delete old toolchains programatically.
>> >
>> > On 27 February 2017 at 20:51, Tim Armstrong <tarmstrong@cloudera.com>
>> > wrote:
>> >
>> > > Maybe I'm misunderstanding, but wouldn't that force re-downloading of
>> the
>> > > entire toolchain every time a developer switches between branches with
>> > > different build IDs?
>> > >
>> > > I know some developers do that frequently, e.g. to try and reproduce
>> bugs
>> > > on older versions or backport patches.
>> > >
>> > > I agree it would be good to fix this, since I've run into this problem
>> > > before, I'm just not quite sure what the best solution is. In the other
>> > > case where I had this issue with LLVM I changed the version number (by
>> > > appending noasserts-) to it, but that's really just a hack.
>> > >
>> > > -Tim
>> > >
>> > > On Mon, Feb 27, 2017 at 4:35 PM, Henry Robinson <henry@cloudera.com>
>> > > wrote:
>> > >
>> > > > As Matt said, I have a patch that implements build ID-based
>> versioning
>> > at
>> > > > https://gerrit.cloudera.org/#/c/6166/2.
>> > > >
>> > > > Does anyone want to take a look? If we could get this in soon it
>> would
>> > > help
>> > > > smooth over the LZ4 change which is going in shortly.
>> > > >
>> > > > On 27 February 2017 at 14:21, Henry Robinson <henry@cloudera.com>
>> > wrote:
>> > > >
>> > > > > I agree that that might be useful, and that it's a separately
>> > > addressable
>> > > > > problem.
>> > > > >
>> > > > > On 27 February 2017 at 14:18, Matthew Jacobs <mj@cloudera.com>
>> > wrote:
>> > > > >
>> > > > >> Just catching up to this e-mail, though I had seen your code
>> reviews
>> > > > >> and I think this approach makes sense. An additional concern
would
>> > be
>> > > > >> how to identify how a toolchain package was built, and AFAIK
this
>> is
>> > > > >> tricky now if only the 'toolchain ID' is known. Before I
saw this
>> > > > >> e-mail I was thinking about this problem (which I think we
can
>> > address
>> > > > >> separately), and that we might want to write the native-toolchain
>> > git
>> > > > >> hash with every toolchain build so that the exact build scripts
>> are
>> > > > >> associated with those build artifacts. I filed
>> > > > >> https://issues.cloudera.org/browse/IMPALA-5002 for this related
>> > > > >> problem.
>> > > > >>
>> > > > >> On Sat, Feb 25, 2017 at 10:22 PM, Henry Robinson <
>> henry@apache.org>
>> > > > >> wrote:
>> > > > >> > As written, the toolchain can't apparently deal with
the
>> > possibility
>> > > > of
>> > > > >> > build flags changing, but a dependency version remaining
the
>> same.
>> > > > >> >
>> > > > >> > LZ4 has never (afaict) been built with optimization
enabled. I
>> > have
>> > > a
>> > > > >> > commit that enables -O3, but that continues to produce
artifacts
>> > for
>> > > > >> > lz4-1.7.5 with no version change. This is a problem
because
>> > > > >> bootstrapping
>> > > > >> > the toolchain will fail to pick up the new binaries
- because
>> the
>> > > > >> > previously downloaded version is still in the local
cache, and
>> > won't
>> > > > be
>> > > > >> > overwritten because of the version change.
>> > > > >> >
>> > > > >> > I think the simplest way to fix this is to write the
toolchain
>> > build
>> > > > ID
>> > > > >> to
>> > > > >> > the dependency version file (that's in the local cache
only)
>> when
>> > > it's
>> > > > >> > downloaded. If that ID changes, the dependency will
be
>> > > re-downloaded.
>> > > > >> >
>> > > > >> > This has the disadvantage that any bump in
>> > IMPALA_TOOLCHAIN_BUILD_ID
>> > > > >> will
>> > > > >> > invalidate all dependencies, and bin/bootstrap_toolchain.py
will
>> > > > >> > re-download all of them. My feeling is that that cost
is better
>> > than
>> > > > >> trying
>> > > > >> > to individually determine whether a dependency has changed
>> between
>> > > > >> > toolchain builds.
>> > > > >> >
>> > > > >> > Any thoughts on whether this is the right way to go?
>> > > > >> >
>> > > > >> > Henry
>> > > > >>
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Henry Robinson
>> > > > > Software Engineer
>> > > > > Cloudera
>> > > > > 415-994-6679 <(415)%20994-6679>
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Henry Robinson
>> > > > Software Engineer
>> > > > Cloudera
>> > > > 415-994-6679 <(415)%20994-6679>
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Henry Robinson
>> > Software Engineer
>> > Cloudera
>> > 415-994-6679 <(415)%20994-6679> <(415)%20994-6679>
>> >
>>

Mime
View raw message