cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sebgoa <>
Subject Re: Feedback of my Phd work in Cloudstack Project
Date Thu, 10 Dec 2015 13:22:13 GMT

On Dec 10, 2015, at 12:31 AM, Igor Wiese <> wrote:

> Hi, Cloudstack Community.
> My name is Igor Wiese, phd Student from Brazil. In my research, I am
> investigating two important questions: What makes two files change
> together? Can we predict when they are going to co-change again?
> I've tried to investigate this question on the Cloudstack project. I've
> collected data from issue reports, discussions and commits and using some
> machine learning techniques to build a prediction model.
> I collected a total of 141 commits in which a pair of files changed
> together and could correctly predict 60% commits.

Hi Igor, why 141 commits ? Is that the only commits you found with only a pair for changes

My gut feeling is that you could check the entire history of the CloudStack repo (~5 years
worth of data) and work on different type of tuples.

141 commits seems like a really small dataset.


> These were the most
> useful information for predicting co-changes of files:
> - sum of number of lines of code added, modified and removed,
> - number of words used to describe and discuss the issues,
> - number of comments in each issue,
> - median value of closeness, a social network measure obtained from issue
> comments, and
> - median value of constraint, a social network measure obtained from issue
> comments.
> To illustrate, consider the following example from our analysis. For
> release 4.4, the files "cloud/hypervisor/" and
> "cloud/hypervisor/guru/ " changed together in 3 commits. In
> another 2 commits, only the first file changed, but not the second.
> Collecting contextual information for each commit made to first file in the
> previous release (4.3), we were able to predict all 3 commits in which both
> files changed together in release 4.4, and we only issued 0 false
> positives. For this pair of files, the most important contextual
> information was the number of lines of code added, removed and modified in
> each commit,the number of comments in each issue, and social network
> measures (closeness, density, constraint, hierarchy) obtained from issue
> comments.
> - Do these results surprise you? Can you think in any explanation for the
> results?
> - Do you think that our rate of prediction is good enough to be used for
> building tool support for the software community?
> - Do you have any suggestion on what can be done to improve the change
> recommendation?
> You can visit our webpage to inspect the results in details:
> All the best,
> Igor Wiese
> Phd Candidate

View raw message