allura-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Brondsema" <>
Subject [allura:tickets] #5733 Improve performance of Commit._diffs_copied
Date Tue, 11 Aug 2015 20:33:06 GMT


** [tickets:#5733] Improve performance of Commit._diffs_copied**

**Status:** closed
**Milestone:** unreleased
**Labels:** performance scm 
**Created:** Fri Feb 01, 2013 07:29 PM UTC by Cory Johns
**Last Updated:** Mon Dec 29, 2014 08:15 AM UTC
**Owner:** nobody

`Commit._diffs_copied()` is used to determine if a removed blob was actually moved or renamed,
possibly with some changes.  However, it is called every time a commit is viewed and hits
every file removed from a commit, and it is slow enough to be a problem.

Some ideas for optimizing it:

* Short-circuit identical blob comparisons by comparing the blob hash first, as is done w/
* Use `SequenceMatcher.real_quick_ratio()` to get the upper-bound on the ratio to exclude
obvious non-matches quickly, probably followed up with `quick_ratio()` and/or `ratio()` to
confirm a match
* Raise the `DIFF_SIMILARITY_THRESHOLD` and break after a single match instead of continuing
to test all files (though this could give false matches, so maybe not do this one)
* Exclude binary or particularly large blobs

Finally, we should almost certainly move this computation to `compute_diffs()` instead of
doing it every time the commit's diffs are used.

Also, currently, children of removed (or the removed side of moved/renamed) trees are not
included in the diff to avoid hitting this performance issue too often, which causes the added
portion of moved/renamed trees to look like brand new files.  Once the performance of `_diffs_copied()`
is more reasonable and/or pre-computed, the removed trees short-circuit in `compute_diffs()`
needs to be removed.


Sent from because is subscribed to

To unsubscribe from further messages, a project admin can change settings at
 Or, if this is a mailing list, you can unsubscribe from the mailing list.
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message