db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Wiese <igor.wi...@gmail.com>
Subject Feedback of my Phd work in Derby
Date Thu, 10 Dec 2015 00:04:05 GMT
Hi, Derby Community.

My name is Igor Wiese, phd Student from Brazil. In my research I am
investigating two important questions: What makes two files change
together? Can we predict when they are going to co-change again?

I've tried to investigate this question on the Derby project. I've
collected data from issue reports, discussions and commits and using some
machine learning techniques to build a prediction model.

I collected a total of 5266 commits in which a pair of files changed
together and could correctly predict 86% commits. These were the most
useful information for predicting co-changes of files:

- number of lines of code added,

- number of lines of code removed,

- sum of number of lines of code added, modified and removed,

- number of words used to describe and discuss the issues, and

- median value of closeness, a social network measure obtained from issue

To illustrate, consider the following example from our analysis. For
release 10.10, the files "sql/catalog/DataDictionaryImpl.java" and
"impl/storeless/EmptyDictionary.java" changed together in 7 commits. In
another 4 commits, only the first file changed, but not the second.
Collecting contextual information for each commit made to first file in the
previous release, we were able to predict all 7 commits in which both files
changed together in release 10.10, and we only issued 2 wrong predictions.
For this pair of files, the most important contextual information was the
number of lines of code added, removed and modified in each commit, and a
social network measure (constraint) obtained from issue comments.

- Do these results surprise you? Can you think in any explanation for the

- Do you think that our rate of prediction is good enough to be used for
building tool support for the software community?

- Do you have any suggestion on what can be done to improve the change

You can visit our webpage to inspect the results in details:

All the best,
Igor Wiese
Phd Candidate

View raw message