subversion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Fuhrmann <stefan.fuhrm...@wandisco.com>
Subject Efficient revprop access in lib_repos
Date Sun, 11 Oct 2015 12:25:47 GMT
[This is the rationale and additional documentation to an upcoming
set of commits.]

When changing a revprop, we provide the following visibility guarantees:

1. A request that ends before the set_revprop started, sees the old value
(duh!).
2. A request that starts after the set_revprop completes, sees the new
value.
3. A request that starts before the set_revprop completes, may see the old
or
   the new value and may not be consistent about it.

The reporter in lib_repos exploits 3. by keeping a hash of the dates
and authors for all revisions it encountered so far.  Not only saves
that 50+% of revprop lookups but also guarantees consistent properties
for all reported nodes - at the expense of ~80 bytes/reported revision.

I'd like to expand on that by giving the FS API users more control on
whether they need to get the latest revprop state (e.g. at the beginning
of a lib_repos report) or not (e.g. follow-up requests during the same
lib_repos report).  That will allow the FS layer to read whole revprop
packs and simply deliver their contents during 'svn log' instead of
re-opening the same files over and over again.

I'll commit a patch set that

* introduces the notion of a barrier in svn_fs_revision_prop2 and
  svn_fs_revision_proplist2 (default: read latest from disk) plus
  a new explicit barier function svn_fs_refresh_rev_props,
* updates the lib_repos queries to only use one barrier, and
* implements them in the FS backends.

Note that it is perfectly legal for an FS to ignore the extra flag
and always fetch the data from disk.  So, this how our backends will
implement it:

* BDB will ignore the flag and always deliver data as today.

* FSFS will use a *temporary*, svn_fs_t-local revprop cache, keyed
  by a UUID. Whenever there is a barrier, the UUID gets cleaned and
  any cache lookup would miss (it should not even try to use the cache).
  Upon the first non-barrier request, a new UUID gets created and the
  cache will be populated whenever we read revprops from disk.

* FSX will only check for a new revprop generation at a barrier.
  That eliminates the requirement to keep the revprop gen file open,
  IOW, revprop caching works with the "normal" file open/read/close
  pattern.

As a result in FSFS, we get most of the benefits of revprop caching,
fewer file operations and lower CPU load, without introducing complex
cache invalidation schemes.  This is significant in packed repos
but will also benefit non-packed revs in some operations.

I'll roughly commit the changes in the following order:

* minor cleanup of the FSFS code to keep the relevant change small
* update FS vtable, defaulting to current behavior
* implement the new behavior in FSFS
* bump FS API
* update lib_repos queries one-by-one for review and testability
* implement the new behavior in FSX

-- Stefan^2.

Mime
View raw message