cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russ Hatch (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-8954) risk analysis of patches based on past defects
Date Fri, 08 May 2015 19:38:00 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-8954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Russ Hatch updated CASSANDRA-8954:
----------------------------------
    Description: 
Some changes to source are much more risky than others, and we can analyze data from JIRA
+ git to make educated guesses about risk level. This is a backwards looking technique with
limitations but still may be useful (yes, the past does not equal the future!).

(disclaimer: I did not come up with this technique).

The executive summary: 1) correlate changes with defects, by code unit such as filename 2)
quantify risk of new patches by combining correlation with a measure of change "size", as
(correlation * change_size)

The basic idea is to build a tool which correlates past Defect tickets to the files which
were changed to fix them. If a Defect required changes to specific files to fix, then in some
sense past changes to those files (or their original implementations) were problematic. Therefore,
future changes to those files carry some potential risk as well.

This requires getting an occasional dump of Defect type issues, and an occasional dump of
commit messages. Defects would have to be associated to commits based on a text search of
commit messages. From there we build a weighted model of which source files get touched the
most to fix defects (say giving each file name a ranking of 1 to 10 where 10 carries the most
risk).

To analyze specific patches going forward we look at the defect weight for that source file,
and factor in a metric for a patch's changes in that file (maybe (lines changed/total lines),
OR (change in cyclomatic complexity/total complexity)). Out of this we get a number representing
a theoretical risk.

  was:
Some changes to source are much more risky than others, and we can analyze data from JIRA
+ git to make educated guesses about risk level. This is a backwards looking technique with
limitations but still may be useful (yes, the past does not equal the future!).

(disclaimer: I did not come up with this technique).

The basic idea is to build a tool which correlates past Defect tickets to the files which
were changed to fix them. If a Defect required changes to specific files to fix, then in some
sense past changes to those files (or their original implementations) were problematic. Therefore,
future changes to those files carry some potential risk as well.

This requires getting an occasional dump of Defect type issues, and an occasional dump of
commit messages. Defects would have to be associated to commits based on a text search of
commit messages. From there we build a weighted model of which source files get touched the
most to fix defects (say giving each file name a ranking of 1 to 10 where 10 carries the most
risk).

To analyze specific patches going forward we look at the defect weight for that source file,
and factor in a metric for a patch's changes in that file (maybe (lines changed/total lines),
OR (change in cyclomatic complexity/total complexity)). Out of this we get a number representing
a theoretical risk.


> risk analysis of patches based on past defects
> ----------------------------------------------
>
>                 Key: CASSANDRA-8954
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8954
>             Project: Cassandra
>          Issue Type: Test
>            Reporter: Russ Hatch
>            Assignee: Russ Hatch
>
> Some changes to source are much more risky than others, and we can analyze data from
JIRA + git to make educated guesses about risk level. This is a backwards looking technique
with limitations but still may be useful (yes, the past does not equal the future!).
> (disclaimer: I did not come up with this technique).
> The executive summary: 1) correlate changes with defects, by code unit such as filename
2) quantify risk of new patches by combining correlation with a measure of change "size",
as (correlation * change_size)
> The basic idea is to build a tool which correlates past Defect tickets to the files which
were changed to fix them. If a Defect required changes to specific files to fix, then in some
sense past changes to those files (or their original implementations) were problematic. Therefore,
future changes to those files carry some potential risk as well.
> This requires getting an occasional dump of Defect type issues, and an occasional dump
of commit messages. Defects would have to be associated to commits based on a text search
of commit messages. From there we build a weighted model of which source files get touched
the most to fix defects (say giving each file name a ranking of 1 to 10 where 10 carries the
most risk).
> To analyze specific patches going forward we look at the defect weight for that source
file, and factor in a metric for a patch's changes in that file (maybe (lines changed/total
lines), OR (change in cyclomatic complexity/total complexity)). Out of this we get a number
representing a theoretical risk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message