hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Wiese <igor.wi...@gmail.com>
Subject Can you help us Hive Community?
Date Tue, 15 Dec 2015 12:04:10 GMT
Hi, Hive Community.

My name is Igor Wiese, phd Student from Brazil. I sent an email a week
ago about my research. We received some visit to inspect the results
but any feedback was provided.

I am investigating two important questions: What makes two files
change together? Can we predict when they are going to co-change
again?


My name is Igor Wiese, phd Student from Brazil. In my research I am
investigating two important questions: What makes two files change
together? Can we predict when they are going to co-change again?


I've tried to investigate this question on the Hive project. I've
collected data from issue reports, discussions and commits and using
some machine learning techniques to build a prediction model.


I collected a total of 721 commits in which a pair of files changed
together and could correctly predict 53% commits. These were the most
useful information for predicting co-changes of files:

- sum of number of lines of code added, modified and removed,

- number of words used to describe and discuss the issues,

- number of comments in each issue,

- median value of closeness, a social network measure obtained from
issue comments, and

- median value of effective size, a social network measure obtained
from issue comments.


To illustrate, consider the following example from our analysis. For
release 0.14, the files "metastore/MetaStoreDirectSql.java" and
"metastore/ObjectStore.java" changed together in 4 commits. In another
2 commits, only the first file changed, but not the second. Collecting
contextual information for each commit made to first file in previous
release, we were able to predict 4 commits in which both files changed
together in release 0.14, and we issued 0 false positives and two
wrong predictions. For this pair of files, the most important
contextual information was the number of lines of codes added, the sum
of lines of codes added, removed and modified, and two social network
metrics (constraint, ties) obtained from issue comments


- Do these results surprise you? Can you think in any explanation for
the results?

- Do you think that our rate of prediction is good enough to be used
for building tool support for the software community?

- Do you have any suggestion on what can be done to improve the change
recommendation?


You can visit a webpage to inspect the results in details:
http://flosscoach.com/index.php/17-cochanges/72-hive


All the best,
Igor Wiese

Phd Candidate


-- 
=================================
Igor Scaliante Wiese
PhD Candidate - Computer Science @ IME/USP
Faculty in Dept. of Computing at Universidade Tecnológica Federal do Paraná

Mime
View raw message