cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Wiese <igor.wi...@gmail.com>
Subject Re: Feedback of my Phd work in Cassandra Project
Date Thu, 10 Dec 2015 01:57:14 GMT
Hi Dave, Thanks for your suggestion.

In fact, we tested many other situations that two files were changed
together. We made it available in
http://flosscoach.com/index.php/17-cochanges/66-cassandra

Just to let you know that we found situations like, a file from "db"
package changed together with a file in the "io" package. Maybe some files
are more "easy" to identify since they are in the same package or highly
correlated (coupled), but there are a lot of commits that files changed not
because they had some delegation, instantiation, implementation or
inheritance.

Our approach was based on contextual information (issues metadata, commit
metadata and social organization), that can be easialy collected from JIRA
and SVN/GIT, instead of parse the code to find coupling among files.

Do you think that a tool could be useful to alert devopers about "possible
mistakes" like you suggest?

All the best,
Igor Wiese



Em qua, 9 de dez de 2015 às 23:42, Dave Brosius <dbrosius@mebigfatguy.com>
escreveu:

>   >> Do these results surprise you? Can you think in any explanation for
> the
> results?
>
>      That NodeCmd and NodeProbe are highly correlated is expected,
> NodeCmd delegates it's functionality to NodeProbe.
>
> >> Do you think that our rate of prediction is good enough to be used for
> building tool support for the software community?
>
>      Curious to know what your conjecture is about the correlation of
> files and how it can help you develop better. I suppose if two files are
> perfectly or near perfectly correlated, and you find a commit with only
> one file, you might be led to the conclusion that a mistake was made
> perhaps. Even that is probably dubious. Other than that, not sure what
> the benefit is.
>
> >> Do you have any suggestion on what can be done to improve the change
> recommendation?
>
> Might be worth looking at package correlations, I could see where a
> particular package was say, spring beans, and any file in this directory
> is likely to be committed along side a spring xml file (if those are
> still used).
>
>
> On 12/09/2015 06:25 PM, Igor Wiese wrote:
> > Hi, Cassandra Community.
> >
> > My name is Igor Wiese, phd Student from Brazil. I am investigating two
> > important questions: What makes two files change together? Can we predict
> > when they are going to co-change again?
> >
> > I've tried to investigate this question on the Cassandra project. I've
> > collected data from issue reports, discussions and commits and using some
> > machine learning techniques to build a prediction model.
> >
> > I collected a total of 1197 commits in which a pair of files changed
> > together and could correctly predict 48% commits. These were the most
> > useful information for predicting co-changes of files:
> >
> > - number of lines of code added,
> >
> > - number of lines of code removed,
> >
> > - sum of number of lines of code added, modified and removed,
> >
> > - number of words used to describe and discuss the issues, and
> >
> > - median value of closeness, a social network measure obtained from issue
> > comments.
> >
> > To illustrate, consider the following example from our analysis. For
> > release 1.0, the files "cassandra/tools/NodeCmd.java" and
> > "cassandra/tools/NodeProbe.java" changed together in 16 commits. In
> another
> > 6 commits, only the first file changed, but not the second. Collecting
> > contextual information for each commit made to first file in the previous
> > release, we were able to predict all 13 commits in which both files
> changed
> > together in release 1.0, and we only issued 2 false positives. For this
> > pair of files, the most important contextual information was the number
> of
> > lines of code added, removed and modified in each commit, the number of
> > words used to describe and discuss the issues and the number of comments
> in
> > the issues.
> >
> > - Do these results surprise you? Can you think in any explanation for the
> > results?
> >
> > - Do you think that our rate of prediction is good enough to be used for
> > building tool support for the software community?
> >
> > - Do you have any suggestion on what can be done to improve the change
> > recommendation?
> >
> > You can visit our webpage to inspect the results in details:
> > http://flosscoach.com/index.php/17-cochanges/66-cassandra
> >
> > All the best,
> > Igor Wiese
> > Phd Candidate
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message