cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Wiese <igor.wi...@gmail.com>
Subject Re: Feedback of my Phd work in Cloudstack Project
Date Thu, 10 Dec 2015 20:01:31 GMT
Hi Patrick

The problem with new files is the absence of history to build the
prediction models. I need at least some commits (10 commits for example).
Yes, the link between files is what we are predicting. We can predict
changes involving commands.properties, XML files in general, .txt files, or
any source code extension :-)

Thanks for the feedback.


2015-12-10 17:40 GMT-02:00 Patrick Dube <patrickdube91@gmail.com>:

> Are you handling new files as well, or the links between sets of files (or
> packages)? As an example, if a user creates a new API cmd, then he will
> update the "commands.properties" file. Another example, if a VO file is
> updated, then there will be a db migration file added as well.
> Cool work,
>
> On Thu, Dec 10, 2015 at 9:21 AM, Igor Wiese <igor.wiese@gmail.com> wrote:
>
> > Hi Sebastien.
> >
> > We used only 141 commits because we needed data from the issues. As my
> > assumption is related to the contextual information from Issues and
> Social
> > aspects, we need to aggregate commits and Issues.
> >
> > First, I collected the issues from JIRA and then i tryed to aggregate the
> > commits that explicit made mentions to an issue collected. I only also
> used
> > closed issues to obtain the confidence that the code used to build my
> > models have been merged and checked by the community.
> >
> > That is the weak point of my approach. I need the past data from the
> > issues. Sometimes it is not available for past time.
> > It is in my plan to use also data from github to make the dataset more
> > complete.
> >
> > All the best,
> >
> > 2015-12-10 11:22 GMT-02:00 sebgoa <runseb@gmail.com>:
> >
> > >
> > > On Dec 10, 2015, at 12:31 AM, Igor Wiese <igor.wiese@gmail.com> wrote:
> > >
> > > > Hi, Cloudstack Community.
> > > >
> > > > My name is Igor Wiese, phd Student from Brazil. In my research, I am
> > > > investigating two important questions: What makes two files change
> > > > together? Can we predict when they are going to co-change again?
> > > >
> > > > I've tried to investigate this question on the Cloudstack project.
> I've
> > > > collected data from issue reports, discussions and commits and using
> > some
> > > > machine learning techniques to build a prediction model.
> > > >
> > > > I collected a total of 141 commits in which a pair of files changed
> > > > together and could correctly predict 60% commits.
> > >
> > >
> > > Hi Igor, why 141 commits ? Is that the only commits you found with
> only a
> > > pair for changes ?
> > >
> > > My gut feeling is that you could check the entire history of the
> > > CloudStack repo (~5 years worth of data) and work on different type of
> > > tuples.
> > >
> > > 141 commits seems like a really small dataset.
> > >
> > > -Sebastien
> > >
> > > > These were the most
> > > > useful information for predicting co-changes of files:
> > > >
> > > > - sum of number of lines of code added, modified and removed,
> > > >
> > > > - number of words used to describe and discuss the issues,
> > > >
> > > > - number of comments in each issue,
> > > >
> > > > - median value of closeness, a social network measure obtained from
> > issue
> > > > comments, and
> > > >
> > > > - median value of constraint, a social network measure obtained from
> > > issue
> > > > comments.
> > > >
> > > > To illustrate, consider the following example from our analysis. For
> > > > release 4.4, the files "cloud/hypervisor/XenServerGuru.java" and
> > > > "cloud/hypervisor/guru/VMwareGuru.java " changed together in 3
> commits.
> > > In
> > > > another 2 commits, only the first file changed, but not the second.
> > > > Collecting contextual information for each commit made to first file
> in
> > > the
> > > > previous release (4.3), we were able to predict all 3 commits in
> which
> > > both
> > > > files changed together in release 4.4, and we only issued 0 false
> > > > positives. For this pair of files, the most important contextual
> > > > information was the number of lines of code added, removed and
> modified
> > > in
> > > > each commit,the number of comments in each issue, and social network
> > > > measures (closeness, density, constraint, hierarchy) obtained from
> > issue
> > > > comments.
> > > >
> > > > - Do these results surprise you? Can you think in any explanation for
> > the
> > > > results?
> > > >
> > > > - Do you think that our rate of prediction is good enough to be used
> > for
> > > > building tool support for the software community?
> > > >
> > > > - Do you have any suggestion on what can be done to improve the
> change
> > > > recommendation?
> > > >
> > > > You can visit our webpage to inspect the results in details:
> > > > http://flosscoach.com/index.php/17-cochanges/67-cloudstack
> > > >
> > > > All the best,
> > > > Igor Wiese
> > > > Phd Candidate
> > >
> > >
> >
> >
> > --
> > =================================
> > Igor Scaliante Wiese
> > PhD Candidate - Computer Science @ IME/USP
> > Faculty in Dept. of Computing at Universidade Tecnológica Federal do
> Paraná
> >
>



-- 
=================================
Igor Scaliante Wiese
PhD Candidate - Computer Science @ IME/USP
Faculty in Dept. of Computing at Universidade Tecnológica Federal do Paraná

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message