hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Wittenauer ...@altiscale.com>
Subject Re: about CHANGES.txt
Date Wed, 18 Mar 2015 15:03:27 GMT

If you want to write the code, knock yourself out.  It’ll be interesting to see what sits
in trunk[1].  There is no question that the git log is more accurate, but collating the information
to convey the same message that even our current changes.txt file is fraught with danger and
complexity. 

As you work on it, here’s some data to compare. On my laptop with the crappy hacks[2] I
made to the relnotes.py script over a not-high-speed connection, generating all 4 changes
files and all 4 release notes files:

$ time ./releasemdmaker.py --version 3.0.0 --previousVer 2.8.0
WARNING: incompatible change HADOOP-11356 lacks release notes.
WARNING: incompatible change HADOOP-10474 lacks release notes.
WARNING: incompatible change HDFS-5079 lacks release notes.
WARNING: incompatible change MAPREDUCE-6234 lacks release notes.
WARNING: incompatible change MAPREDUCE-6223 lacks release notes.
WARNING: incompatible change MAPREDUCE-5785 lacks release notes.

real	0m21.311s
user	0m0.214s
sys	0m0.207s

$ time ./releasemdmaker.py --version 2.7.0 --previousVer 2.6.1
WARNING: incompatible change HADOOP-11498 lacks release notes.
WARNING: incompatible change HADOOP-11385 lacks release notes.
WARNING: incompatible change HADOOP-10530 lacks release notes.
WARNING: incompatible change HDFS-6651 lacks release notes.
WARNING: incompatible change HDFS-6252 lacks release notes.

real	1m4.360s
user	0m0.563s
sys	0m0.159s

[1] Alas, the current changes.txt has quite a few problems: HADOOP-11718 . 

[2] - which means that someone who actually likes and uses Python could almost certainly optimize
this code.

On Mar 17, 2015, at 4:13 PM, Yongjun Zhang <yzhang@cloudera.com> wrote:

> Hi Allen,
> 
> To make it simpler for you to address my question, does "git log" miss
> commits or report redundant commits on a branch with merged
> ancestor branches?
> 
> Thanks.
> 
> --Yongjun
> 
> 
> On Tue, Mar 17, 2015 at 11:21 AM, Allen Wittenauer <aw@altiscale.com> wrote:
> 
>> 
>>        Nope.  I’m not particularly in the mood to write a book about a
>> topic that I’ve beat to death in private conversations over the past 6
>> months other than highlighting that any solution needs to be able to work
>> against scenarios like we had 3 years ago with four active release branches
>> + trunk.
>> 
>> On Mar 17, 2015, at 10:56 AM, Yongjun Zhang <yzhang@cloudera.com> wrote:
>> 
>>> Thanks Ravi and Colin for the feedback.
>>> 
>>> Hi Allen,
>>> 
>>> You pointed out that "git log" has problem when dealing with branch that
>>> has merges, would you please elaborate the problem?
>>> 
>>> Thanks.
>>> 
>>> --Yongjun
>>> 
>>> On Mon, Mar 16, 2015 at 7:08 PM, Colin McCabe <cmccabe@alumni.cmu.edu>
>>> wrote:
>>> 
>>>> Branch merges made it hard to access change history on subversion
>>>> sometimes.
>>>> 
>>>> You can read the tale of woe here:
>>>> 
>>>> 
>> http://programmers.stackexchange.com/questions/206016/maintaining-svn-history-for-a-file-when-merge-is-done-from-the-dev-branch-to-tru
>>>> 
>>>> Excerpt:
>>>> "....prior to Subversion 1.8. The files in the branch and the files in
>>>> trunk are copies and Subversion keeps track with svn log only for
>>>> specific files, not across branches."
>>>> 
>>>> I think that's how the custom of CHANGES.txt started, and it was
>>>> cargo-culted forward into the git era despite not serving much purpose
>>>> any more these days (in my opinion.)
>>>> 
>>>> best,
>>>> Colin
>>>> 
>>>> On Mon, Mar 16, 2015 at 4:49 PM, Ravi Prakash <ravihoo@ymail.com>
>> wrote:
>>>>> +1 for automating the information contained in CHANGES.txt. There are
>>>> some changes which go in without JIRAs sometimes (CVEs eg.) . I like git
>>>> log because its the absolute source of truth (cryptographically secure,
>>>> audited, distributed, yadadada). We could always use git hooks to force
>> a
>>>> commit message format.
>>>>> a) cherry-picks have the same message (by default) as the original)b)
>>>> I'm not sure why branch-mergers would be a problem?c) "Whoops I missed
>>>> something in the previous commit" wouldn't happen if our hooks were
>>>> smartishd) "no identification of what type of commit it was without
>> hooking
>>>> into JIRA anyway." This would be in the format of the commit message
>>>>> 
>>>>> Either way I think would be an improvement.
>>>>> Thanks for your ideas folks
>>>>> 
>>>>> 
>>>>> 
>>>>>    On Monday, March 16, 2015 11:51 AM, Colin P. McCabe <
>>>> cmccabe@apache.org> wrote:
>>>>> 
>>>>> 
>>>>> +1 for generating CHANGES.txt from JIRA and/or git as part of making
a
>>>>> release.  Or just dropping it altogether.  Keeping it under version
>>>>> control creates lot of false conflicts whenever submitting a patch and
>>>>> generally makes committing minor changes unpleasant.
>>>>> 
>>>>> Colin
>>>>> 
>>>>> On Sat, Mar 14, 2015 at 8:36 PM, Yongjun Zhang <yzhang@cloudera.com>
>>>> wrote:
>>>>>> Hi Allen,
>>>>>> 
>>>>>> Thanks a lot for your input!
>>>>>> 
>>>>>> Looks like problem a, c, d you listed is not too bad, assuming we
can
>>>> solve
>>>>>> d by pulling this info from jira as Sean pointed out.
>>>>>> 
>>>>>> Problem b (branch mergers) seems to be a real one, and your approach
>> of
>>>>>> using JIRA system to build changes.txt is a reasonably good way.
This
>>>> does
>>>>>> count on that we update jira accurately. Since this update is a manual
>>>>>> process, it's possible to have inconsistency, but may be not too
bad.
>>>> Since
>>>>>> any mistake found here can be remedied by fixing the jira side and
>>>>>> refreshing the result.
>>>>>> 
>>>>>> I wonder if we as a community should switch to using your way, and
>> save
>>>>>> committer's effort of taking care of CHANGES.txt (quite some save
>> IMO).
>>>>>> Hope more people can share their thoughts.
>>>>>> 
>>>>>> Thanks.
>>>>>> 
>>>>>> --Yongjun
>>>>>> 
>>>>>> On Fri, Mar 13, 2015 at 4:45 PM, Allen Wittenauer <aw@altiscale.com>
>>>> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> I think the general consensus is don’t include the changes.txt
file
>> in
>>>>>>> your commit. It won’t be correct for both branches if such
a commit
>> is
>>>>>>> destined for both. (No, the two branches aren’t the same.)
>>>>>>> 
>>>>>>> No, git log isn’t more accurate.  The problems are:
>>>>>>> 
>>>>>>> a) cherry picks
>>>>>>> b) branch mergers
>>>>>>> c) “whoops i missed something in that previous commit”
>>>>>>> d) no identification of what type of commit it was without hooking
>> into
>>>>>>> JIRA anyway.
>>>>>>> 
>>>>>>> This is why I prefer building the change log from JIRA.  We already
>>>> build
>>>>>>> release notes from JIRA, BTW.  (Not that anyone appears to read
them
>>>> given
>>>>>>> the low quality of our notes…)  Anyway, here’s what I’ve
been
>>>>>>> building/using as changes.txt and release notes:
>>>>>>> 
>>>>>>> https://github.com/aw-altiscale/hadoop-release-metadata
>>>>>>> 
>>>>>>> I try to update these every day. :)
>>>>>>> 
>>>>>>> On Mar 13, 2015, at 4:07 PM, Yongjun Zhang <yzhang@cloudera.com>
>>>> wrote:
>>>>>>> 
>>>>>>>> Thanks Esteban, I assume this report gets info purely from
the jira
>>>>>>>> database, but not "git log" of a branch, right?
>>>>>>>> 
>>>>>>>> I hope we get the info from "git log" of a release branch
because
>>>> that'd
>>>>>>> be
>>>>>>>> more accurate.
>>>>>>>> 
>>>>>>>> --Yongjun
>>>>>>>> 
>>>>>>>> On Fri, Mar 13, 2015 at 3:11 PM, Esteban Gutierrez <
>>>> esteban@cloudera.com
>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> JIRA already provides a report:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12327179&styleName=Html&projectId=12310240
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> cheers,
>>>>>>>>> esteban.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Cloudera, Inc.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Fri, Mar 13, 2015 at 3:01 PM, Sean Busbey <busbey@cloudera.com>
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> So long as you include the issue number, you can
automate pulling
>>>> the
>>>>>>>>> type
>>>>>>>>>> from jira directly instead of putting it in the message.
>>>>>>>>>> 
>>>>>>>>>> On Fri, Mar 13, 2015 at 4:49 PM, Yongjun Zhang <
>>>> yzhang@cloudera.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>> 
>>>>>>>>>>> I found that changing CHANGES.txt when committing
a jira is error
>>>>>>> prone
>>>>>>>>>>> because of the different sections in the file,
and sometimes we
>>>> forget
>>>>>>>>>>> about changing this file.
>>>>>>>>>>> 
>>>>>>>>>>> After all, git log would indicate the history
of a branch. I
>>>> wonder if
>>>>>>>>> we
>>>>>>>>>>> could switch to a new method:
>>>>>>>>>>> 
>>>>>>>>>>> 1. When committing, ensure the message include
the type of the
>>>> jira,
>>>>>>>>> "New
>>>>>>>>>>> Feature", "Bug Fixes", "Improvement" etc.
>>>>>>>>>>> 
>>>>>>>>>>> 2. No longer need to make changes to CHANGES.txt
for each commit
>>>>>>>>>>> 
>>>>>>>>>>> 3. Before releasing a branch, create the CHANGES.txt
by using
>> "git
>>>>>>> log"
>>>>>>>>>>> command for the given branch..
>>>>>>>>>>> 
>>>>>>>>>>> Thanks.
>>>>>>>>>>> 
>>>>>>>>>>> --Yongjun
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Sean
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 
>> 


Mime
View raw message