hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Wiese <igor.wi...@gmail.com>
Subject Re: Can you help us Hbase Community
Date Tue, 15 Dec 2015 12:42:49 GMT
Thanks for the answer Ram

Yes, we are building a tool for the community. We are using the issue
description as a feature to build our model, but we need that
developers made the first commit, then, we can run our tool and find
other files that may be impacted.

As a first step we are providing a web service to monitor many Apache
projects. For each commit made in a issue, we will provide the report
containing the most likely files to change together. In this case is
not in the "compile time", but after the "first file committed" (can
be only one file).

What do you think? Which tasks do you think that the approach could be
useful (tests, code review, newcomers, etc)?

All the best,
Igor WIese

2015-12-15 10:17 GMT-02:00 ramkrishna vasudevan
> I had a look at the reports.  The prediction model looks good.
> Few questions - what is the idea behind the tool that you plan to build for
> the community.  Are you planning to give a tool that says for a given issue
> description what are the files that it may impact?
> For eg, if an interface is changed automatically all the impl of that
> interface will get changed (am just taking a very simple example) - so your
> tool does take this type of compile time rules also?
> The reason for asking this is from the discussion, comment and social
> activity what parameters do you get to ascertain the related changes.
> Regards
> Ram
> On Tue, Dec 15, 2015 at 5:36 PM, Igor Wiese <igor.wiese@gmail.com> wrote:
>> Hi, Hbase Community.
>> My name is Igor Wiese, phd Student from Brazil. I sent an email a week
>> ago about my research. We received some visit to inspect the results
>> but any feedback was provided.
>> I am investigating two important questions: What makes two files
>> change together? Can we predict when they are going to co-change
>> again?
>> I've tried to investigate this question on the Hbase project. I've
>> collected data from issue reports, discussions and commits and using
>> some machine learning techniques to build a prediction model.
>> I collected a total of 8492 commits in which a pair of files changed
>> together and could correctly predict 71% commits. These were the most
>> useful information for predicting co-changes of files:
>> - sum of number of lines of code added, modified and removed,
>> - number of words used to describe and discuss the issues,
>> - median value of closeness, a social network measure  obtained from
>> issue comments,
>> - median value of effective size, a social network measure obtained
>> from issue comments, and
>> -  median value of hierarchy, a social network measure obtained from
>> issue comments.
>> To illustrate, consider the following example from our analysis. For
>> release 1.1, the files "util/HBaseFsck.java" and
>> "hbase/util/HBaseFsckRepair.java" changed together in 13 commits. In
>> another 40 commits, only the first file changed, but not the second.
>> Collecting contextual information for each commit made to first file
>> in previous release, we were able to predict 9 commits in which both
>> files changed together in release 1.1, and we only issued two false
>> positives and two wrong predictions. For this pair of files, the most
>> important contextual information was the number of developers that
>> commented in an each issue and the social network metric (efficiency)
>> obtained from issue comments.
>> - Do these results surprise you? Can you think in any explanation for
>> the results?
>> - Do you think that our rate of prediction is good enough to be used
>> for building tool support for the software community?
>> - Do you have any suggestion on what can be done to improve the change
>> recommendation?
>> You can visit a webpage to inspect the results in details:
>> http://flosscoach.com/index.php/17-cochanges/71-hbase
>> All the best,
>> Igor Wiese
>> Phd Candidate
>> --
>> =================================
>> Igor Scaliante Wiese
>> PhD Candidate - Computer Science @ IME/USP
>> Faculty in Dept. of Computing at Universidade Tecnológica Federal do Paraná

Igor Scaliante Wiese
PhD Candidate - Computer Science @ IME/USP
Faculty in Dept. of Computing at Universidade Tecnológica Federal do Paraná

View raw message