cxf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From igorwiese <>
Subject Feedback of my Phd work in CXF project
Date Wed, 09 Dec 2015 23:29:03 GMT
Hi, CXF Community. 

My name is Igor Wiese, phd Student from Brazil. I am investigating two
important questions: What makes two files change together? Can we predict
when they are going to co-change again? 

I've tried to investigate this question on the CXF project. I've collected
data from issue reports, discussions and commits and using some machine
learning techniques to build a prediction model.

I collected a total of 6384 commits in which a pair of files changed
together and could correctly predict 86% commits. These were the most useful
information for predicting co-changes of files: 
- number of lines of code added, 
- number of lines of code removed, 
- sum of number of lines of code added, modified and removed, 
- number of words used to describe and discuss the issues, and 
- number of comments in each issue.

To illustrate, consider the following example from our analysis. For release
2.7, the files "cxf/jaxrs/provider/" and
"cxf/jaxrs/provider/" changed together in 11
commits. In another 11 commits, only the first file changed, but not the
second. Collecting contextual information for each commit made to first file
in release 2.6, we were able to predict 9 commits in which both files
changed together in release 2.7, and we only issued one false positive, and
one wrong prediction. For this pair of files, the most important contextual
information was the number of lines of code added in each commit, the number
of lines of code removed in each commit, the sum of lines of code removed,
added and modified in each commit  and the number of words used to describe
and discuss the issues.

- Do these results surprise you? Can you think in any explanation for the
- Do you think that our rate of prediction is good enough to be used for
building tool support for the software community?
- Do you have any suggestion on what can be done to improve the change

You can visit a webpage to inspect the results in details:

All the best, 
Igor Wiese
Phd Candidate

View this message in context:
Sent from the cxf-dev mailing list archive at

View raw message