hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin McCabe <cmcc...@alumni.cmu.edu>
Subject Re: about CHANGES.txt
Date Wed, 18 Mar 2015 21:41:14 GMT
Thanks for that summary, Andrew.  In general, I think you're right
that generating it from JIRA would be easier than generating it from
git.  We've been pretty good about setting JIRA fix versions.

I do think we could generate it from git if we wanted.  We'd have to
have some kind of whitelist or blacklist of commits that looked like
they were there, but weren't, typos, etc.  But since we have JIRA, we
might as well use it.

I hope we can move away from doing feature backports as "squashed
commits."  I've been guilty of this in the past, but I think git
should make this a lot easier to avoid, as you suggested.

best,
Colin


On Wed, Mar 18, 2015 at 2:32 PM, Andrew Wang <andrew.wang@cloudera.com> wrote:
> I don't think we should try and build CHANGES.txt out of git log output.
> There are a number of issues:
>
> - We've all fatfingered JIRA #s when writing the commit message, leading to
> false positives
> - If something is committed then reverted, it's a false positive
> - Both of the above are a consequence of not being able to edit git log,
> without rebase and force push
> - Doesn't work with multiple release branches, unless we want to add
> awareness of release branch lines
> - Merges are hard to track properly since we've previously done the
> branch-2 backport as a single squashed commit. Maybe this will improve with
> the rebase workflow, but that remains to be seen.
>
> All of this is solved by going through JIRA instead of git log. JIRA is the
> single-source-of-truth for what's fixed in a release line or not, and we
> already use it to generate release notes. I don't see why CHANGES.txt
> should be generated from a different source.
>
> The bigger question for me is why we have CHANGES.txt at all when we have
> release notes, since the information is almost identical.
>
> Best.
> Andrew
>
> On Wed, Mar 18, 2015 at 1:20 PM, Sean Busbey <busbey@cloudera.com> wrote:
>
>> On the matter of handling merges in the history, this comes up over in
>> Apache Accumulo where development follows a merge-forward model (commits go
>> oldest first and merge into newer branches). This means that every commit
>> on an older-but-still-active development branch eventually ends up merged
>> into the history of newer branches even when the issue was only relevant to
>> the older branch. The easiest problem with relying on just the git history
>> for changes then is that there's no way to programmatically know which of
>> the commits that show up in the log for a given release tag are relevant to
>> that release and which ones were only relevant to the older development
>> line.
>>
>> -Sean
>>
>> On Wed, Mar 18, 2015 at 2:59 PM, Colin P. McCabe <cmccabe@apache.org>
>> wrote:
>>
>> > Alan, can you forward those private conversations (or some excerpt
>> > thereof) to the list to explain the problem that you see?
>> >
>> > I have been using "git log" to track change history for years and
>> > never had a problem.  In fact, we don't even maintain CHANGES.txt in
>> > Cloudera's distribution including Hadoop.  It causes too many spurious
>> > conflicts during cherry picks so we just discard the CHANGES.txt part
>> > of the change when backporting things to our branches.  When you are
>> > backporting hundreds of patches, and each one has a conflict on
>> > CHANGES.txt (and generally, ALL of them do), it's just not worth it to
>> > hand-resolve those conflicts.
>> >
>> > I also wrote a script to compare which JIRAs were in which branches by
>> > doing a delta of the git commits.  It works pretty well.  You can even
>> > visualize merges in git if you want, with tools like gitk (or even
>> > plain old git log with the right options)
>> >
>> > Colin
>> >
>> >
>> > On Tue, Mar 17, 2015 at 11:21 AM, Allen Wittenauer <aw@altiscale.com>
>> > wrote:
>> > >
>> > >         Nope.  I’m not particularly in the mood to write a book about
a
>> > topic that I’ve beat to death in private conversations over the past 6
>> > months other than highlighting that any solution needs to be able to work
>> > against scenarios like we had 3 years ago with four active release
>> branches
>> > + trunk.
>> > >
>> > > On Mar 17, 2015, at 10:56 AM, Yongjun Zhang <yzhang@cloudera.com>
>> wrote:
>> > >
>> > >> Thanks Ravi and Colin for the feedback.
>> > >>
>> > >> Hi Allen,
>> > >>
>> > >> You pointed out that "git log" has problem when dealing with branch
>> that
>> > >> has merges, would you please elaborate the problem?
>> > >>
>> > >> Thanks.
>> > >>
>> > >> --Yongjun
>> > >>
>> > >> On Mon, Mar 16, 2015 at 7:08 PM, Colin McCabe <cmccabe@alumni.cmu.edu
>> >
>> > >> wrote:
>> > >>
>> > >>> Branch merges made it hard to access change history on subversion
>> > >>> sometimes.
>> > >>>
>> > >>> You can read the tale of woe here:
>> > >>>
>> > >>>
>> >
>> http://programmers.stackexchange.com/questions/206016/maintaining-svn-history-for-a-file-when-merge-is-done-from-the-dev-branch-to-tru
>> > >>>
>> > >>> Excerpt:
>> > >>> "....prior to Subversion 1.8. The files in the branch and the files
>> in
>> > >>> trunk are copies and Subversion keeps track with svn log only for
>> > >>> specific files, not across branches."
>> > >>>
>> > >>> I think that's how the custom of CHANGES.txt started, and it was
>> > >>> cargo-culted forward into the git era despite not serving much
>> purpose
>> > >>> any more these days (in my opinion.)
>> > >>>
>> > >>> best,
>> > >>> Colin
>> > >>>
>> > >>> On Mon, Mar 16, 2015 at 4:49 PM, Ravi Prakash <ravihoo@ymail.com>
>> > wrote:
>> > >>>> +1 for automating the information contained in CHANGES.txt.
There
>> are
>> > >>> some changes which go in without JIRAs sometimes (CVEs eg.) . I
like
>> > git
>> > >>> log because its the absolute source of truth (cryptographically
>> secure,
>> > >>> audited, distributed, yadadada). We could always use git hooks
to
>> > force a
>> > >>> commit message format.
>> > >>>> a) cherry-picks have the same message (by default) as the
>> original)b)
>> > >>> I'm not sure why branch-mergers would be a problem?c) "Whoops I
>> missed
>> > >>> something in the previous commit" wouldn't happen if our hooks
were
>> > >>> smartishd) "no identification of what type of commit it was without
>> > hooking
>> > >>> into JIRA anyway." This would be in the format of the commit message
>> > >>>>
>> > >>>> Either way I think would be an improvement.
>> > >>>> Thanks for your ideas folks
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>     On Monday, March 16, 2015 11:51 AM, Colin P. McCabe <
>> > >>> cmccabe@apache.org> wrote:
>> > >>>>
>> > >>>>
>> > >>>> +1 for generating CHANGES.txt from JIRA and/or git as part
of
>> making a
>> > >>>> release.  Or just dropping it altogether.  Keeping it under
version
>> > >>>> control creates lot of false conflicts whenever submitting
a patch
>> and
>> > >>>> generally makes committing minor changes unpleasant.
>> > >>>>
>> > >>>> Colin
>> > >>>>
>> > >>>> On Sat, Mar 14, 2015 at 8:36 PM, Yongjun Zhang <yzhang@cloudera.com
>> >
>> > >>> wrote:
>> > >>>>> Hi Allen,
>> > >>>>>
>> > >>>>> Thanks a lot for your input!
>> > >>>>>
>> > >>>>> Looks like problem a, c, d you listed is not too bad, assuming
we
>> can
>> > >>> solve
>> > >>>>> d by pulling this info from jira as Sean pointed out.
>> > >>>>>
>> > >>>>> Problem b (branch mergers) seems to be a real one, and
your
>> approach
>> > of
>> > >>>>> using JIRA system to build changes.txt is a reasonably
good way.
>> This
>> > >>> does
>> > >>>>> count on that we update jira accurately. Since this update
is a
>> > manual
>> > >>>>> process, it's possible to have inconsistency, but may be
not too
>> bad.
>> > >>> Since
>> > >>>>> any mistake found here can be remedied by fixing the jira
side and
>> > >>>>> refreshing the result.
>> > >>>>>
>> > >>>>> I wonder if we as a community should switch to using your
way, and
>> > save
>> > >>>>> committer's effort of taking care of CHANGES.txt (quite
some save
>> > IMO).
>> > >>>>> Hope more people can share their thoughts.
>> > >>>>>
>> > >>>>> Thanks.
>> > >>>>>
>> > >>>>> --Yongjun
>> > >>>>>
>> > >>>>> On Fri, Mar 13, 2015 at 4:45 PM, Allen Wittenauer <
>> aw@altiscale.com>
>> > >>> wrote:
>> > >>>>>
>> > >>>>>>
>> > >>>>>> I think the general consensus is don’t include the
changes.txt
>> file
>> > in
>> > >>>>>> your commit. It won’t be correct for both branches
if such a
>> commit
>> > is
>> > >>>>>> destined for both. (No, the two branches aren’t the
same.)
>> > >>>>>>
>> > >>>>>> No, git log isn’t more accurate.  The problems are:
>> > >>>>>>
>> > >>>>>> a) cherry picks
>> > >>>>>> b) branch mergers
>> > >>>>>> c) “whoops i missed something in that previous commit”
>> > >>>>>> d) no identification of what type of commit it was
without hooking
>> > into
>> > >>>>>> JIRA anyway.
>> > >>>>>>
>> > >>>>>> This is why I prefer building the change log from JIRA.
 We
>> already
>> > >>> build
>> > >>>>>> release notes from JIRA, BTW.  (Not that anyone appears
to read
>> them
>> > >>> given
>> > >>>>>> the low quality of our notes…)  Anyway, here’s
what I’ve been
>> > >>>>>> building/using as changes.txt and release notes:
>> > >>>>>>
>> > >>>>>> https://github.com/aw-altiscale/hadoop-release-metadata
>> > >>>>>>
>> > >>>>>> I try to update these every day. :)
>> > >>>>>>
>> > >>>>>> On Mar 13, 2015, at 4:07 PM, Yongjun Zhang <yzhang@cloudera.com>
>> > >>> wrote:
>> > >>>>>>
>> > >>>>>>> Thanks Esteban, I assume this report gets info
purely from the
>> jira
>> > >>>>>>> database, but not "git log" of a branch, right?
>> > >>>>>>>
>> > >>>>>>> I hope we get the info from "git log" of a release
branch because
>> > >>> that'd
>> > >>>>>> be
>> > >>>>>>> more accurate.
>> > >>>>>>>
>> > >>>>>>> --Yongjun
>> > >>>>>>>
>> > >>>>>>> On Fri, Mar 13, 2015 at 3:11 PM, Esteban Gutierrez
<
>> > >>> esteban@cloudera.com
>> > >>>>>>>
>> > >>>>>>> wrote:
>> > >>>>>>>
>> > >>>>>>>> JIRA already provides a report:
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>
>> > >>>
>> >
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12327179&styleName=Html&projectId=12310240
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>> cheers,
>> > >>>>>>>> esteban.
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>> --
>> > >>>>>>>> Cloudera, Inc.
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>> On Fri, Mar 13, 2015 at 3:01 PM, Sean Busbey
<
>> busbey@cloudera.com
>> > >
>> > >>>>>> wrote:
>> > >>>>>>>>
>> > >>>>>>>>> So long as you include the issue number,
you can automate
>> pulling
>> > >>> the
>> > >>>>>>>> type
>> > >>>>>>>>> from jira directly instead of putting it
in the message.
>> > >>>>>>>>>
>> > >>>>>>>>> On Fri, Mar 13, 2015 at 4:49 PM, Yongjun
Zhang <
>> > >>> yzhang@cloudera.com>
>> > >>>>>>>>> wrote:
>> > >>>>>>>>>
>> > >>>>>>>>>> Hi,
>> > >>>>>>>>>>
>> > >>>>>>>>>> I found that changing CHANGES.txt when
committing a jira is
>> > error
>> > >>>>>> prone
>> > >>>>>>>>>> because of the different sections in
the file, and sometimes
>> we
>> > >>> forget
>> > >>>>>>>>>> about changing this file.
>> > >>>>>>>>>>
>> > >>>>>>>>>> After all, git log would indicate the
history of a branch. I
>> > >>> wonder if
>> > >>>>>>>> we
>> > >>>>>>>>>> could switch to a new method:
>> > >>>>>>>>>>
>> > >>>>>>>>>> 1. When committing, ensure the message
include the type of the
>> > >>> jira,
>> > >>>>>>>> "New
>> > >>>>>>>>>> Feature", "Bug Fixes", "Improvement"
etc.
>> > >>>>>>>>>>
>> > >>>>>>>>>> 2. No longer need to make changes to
CHANGES.txt for each
>> commit
>> > >>>>>>>>>>
>> > >>>>>>>>>> 3. Before releasing a branch, create
the CHANGES.txt by using
>> > "git
>> > >>>>>> log"
>> > >>>>>>>>>> command for the given branch..
>> > >>>>>>>>>>
>> > >>>>>>>>>> Thanks.
>> > >>>>>>>>>>
>> > >>>>>>>>>> --Yongjun
>> > >>>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>>> --
>> > >>>>>>>>> Sean
>> > >>>>>>>>>
>> > >>>>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>
>> > >>>>
>> > >>>
>> > >
>> >
>>
>>
>>
>> --
>> Sean
>>

Mime
View raw message