subversion-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stef...@apache.org
Subject svn commit: r1510990 - in /subversion/branches/log-addressing/subversion/libsvn_fs_fs: structure structure-indexes
Date Tue, 06 Aug 2013 15:14:15 GMT
Author: stefan2
Date: Tue Aug  6 15:14:14 2013
New Revision: 1510990

URL: http://svn.apache.org/r1510990
Log:
On the log-addressing branch:  Document the new addressing scheme
of format7 in our structure doc.  Add the description of the index
file format as as separate document.  No functional change.

* subversion/libsvn_fs_fs/structure
  (Layout of the FS directory): add index files
  (Filesystem formats): add f7
  (Filesystem format options): document "addressing" option
  (Addressing modes,
   Index files): new sections
  (Packing revisions,
   Node-revision IDs,
   Revision file format): update
  (Transaction layout): update; add proto index files 

* subversion/libsvn_fs_fs/structure-indexes
  (): new file describing the binary index files

Added:
    subversion/branches/log-addressing/subversion/libsvn_fs_fs/structure-indexes
Modified:
    subversion/branches/log-addressing/subversion/libsvn_fs_fs/structure

Modified: subversion/branches/log-addressing/subversion/libsvn_fs_fs/structure
URL: http://svn.apache.org/viewvc/subversion/branches/log-addressing/subversion/libsvn_fs_fs/structure?rev=1510990&r1=1510989&r2=1510990&view=diff
==============================================================================
--- subversion/branches/log-addressing/subversion/libsvn_fs_fs/structure (original)
+++ subversion/branches/log-addressing/subversion/libsvn_fs_fs/structure Tue Aug  6 15:14:14
2013
@@ -37,6 +37,8 @@ repository) is:
     <shard>.pack/     Pack directory, if the repo has been packed (see below)
       pack            Pack file, if the repository has been packed (see below)
       manifest        Pack manifest file, if a pack file exists (see below)
+      pack.l2p        Log-to-phys index file (format 7+, see below)
+      pack.p2l        Phys-to-log index file (format 7+, see below)
   revprops/           Subdirectory containing rev-props
     <shard>/          Shard directory, if sharding is in use (see below)
       <revnum>        File containing rev-props for <revnum>
@@ -138,6 +140,7 @@ The formats are:
   Format 4, understood by Subversion 1.6+
   Format 5, understood by Subversion 1.7-dev, never released
   Format 6, understood by Subversion 1.8
+  Format 7, understood by Subversion 1.9
 
 The differences between the formats are:
 
@@ -148,6 +151,7 @@ Delta representation in revision files
 Format options
   Formats 1-2: none permitted
   Format 3+:   "layout" option
+  Format 7+:   "addressing" option
 
 Transaction name reuse
   Formats 1-2: transaction names may be reused
@@ -183,15 +187,21 @@ Shard packing:
   Format 6+:  Applied equally to revision data and revprop data
     (i.e. same min packed revision)
 
+Addressing:
+  Format 1-6: Physical addressing; uses fixed positions within a rev file
+  Format 7+:  Logical addressing; uses item index that will be translated
+    on-the-fly to the actual rev / pack file location
+
 # Incomplete list.  See SVN_FS_FS__MIN_*_FORMAT
 
 
 Filesystem format options
 -------------------------
 
-Currently, the only recognised format option is "layout", which
-specifies the paths that will be used to store the revision files and
-revision property files.
+Currently, the only recognised format options are "layout" and "addressing".
+The first specifies the paths that will be used to store the revision
+files and revision property files.  The second specifies for which
+revisions address translation is required.
 
 The "layout" option is followed by the name of the filesystem layout
 and any required parameters.  The default layout, if no "layout"
@@ -219,6 +229,73 @@ The known layouts, and the parameters th
   revs/0/ directory will contain revisions 0-999, revs/1/ will contain
   1000-1999, and so on.
 
+The "addressing" option is followed by the name of the addressing mode
+and any required parameters.  The default addressing, if no "addressing"
+keyword is specified, is the 'physical' addressing.
+
+The supported modes, and the parameters they require, are as follows:
+
+"physical"
+  All existing and future revision files will use the traditional
+  physical addressing scheme.  All references are given as rev/offset
+  pairs with "offset" being the file position offset relative to the
+  begin of the revision in the respective rev or pack file.
+
+"logical <first-revision-to-use-it>"
+  'first-revision-to-use-it' specifies the first revision to use logical
+  addressing, must coincide the begin of a shard and may be a future
+  revision.  All earlier revisions use physical addressing.  It illegal
+  to use logical addressing on non-shared repositories.
+
+
+Addressing modes
+----------------
+
+Two addressing modes are supported in format 7: physical and logical
+addressing.  Both use the same address format but apply a different
+interpretation to it.  Older formats only support physical addressing.
+
+All items are addressed using <rev> <item_index> pairs.  In physical
+addressing mode, item_index is the (ASCII decimal) number of bytes from
+the start of the revision file to the start respective item.  For non-
+packed files that is also the absolute file offset.  Revision pack files
+are simply concatenate multiple rev files, i.e. the absolute file offset
+is determined as
+
+  absolute offset = rev offset taken from manifest + item_index
+  
+This simple addressing scheme makes it hard to change the location of
+any item since that may break references from later revisions.
+  
+Logical addressing uses an index file to translate the rev / item_index
+pairs into absolute file offsets.  There is one such index for every rev /
+pack file using logical addressing and both are created in sync.  That
+makes it possible to reorder items during pack file creation, particularly
+to mix items from different revisions.
+
+Some item_index values are pre-defined and apply to every revision:
+
+  0 ... not used / invalid
+  1 ... changed path list
+  2 ... root node revision
+
+A reverse index (phys-to-log) is being created as well that allows for
+translating arbitrary file locations into item descriptions (type, rev,
+item_index, on-disk length).  Known item types
+
+  0 ... unused / empty section
+  1 ... file representation
+  2 ... directory representation
+  3 ... file property representation
+  4 ... directory property representation
+  5 ... node revision
+  6 ... changed paths list
+
+The various representation types all share the same morphology.  The
+distinction is only made to allow for more effective reordering heuristics.
+Zero-length items are allowed.
+
+
 Packing revisions
 -----------------
 
@@ -229,9 +306,14 @@ records the indexes of the corresponding
 In addition, the original shard is removed, and reads are redirected to the
 pack file.
 
-The manifest file consists of a list of offsets, one for each revision in the
-pack file.  The offsets are stored as ASCII decimal, and separated by a newline
-character.
+The manifest file consists of a list of offsets, one for each revision in
+the pack file.  The offsets are stored as ASCII decimal, and separated by
+a newline character.
+
+Revision pack files using logical addressing don't use manifest files but
+index files instead.  The revisions inside a pack file will also get
+interleaved to reduce I/O for typical access patterns.
+
 
 Packing revision properties (format 5: SQLite)
 ---------------------------
@@ -341,13 +423,12 @@ Within a new transaction:
 Within a revision:
 
   Within a revision file, node-revs have a txn-id field of the form
-  "r<rev>/<offset>", to support easy lookup. The <offset> is the (ASCII
-  decimal) number of bytes from the start of the revision file to the
-  start of the node-rev.
+  "r<rev>/<item_index>", to support easy lookup.  See addressing modes
+  for details.
 
   During the final phase of a commit, node-revision IDs are rewritten
   to have repository-wide unique node-ID and copy-ID fields, and to have
-  "r<rev>/<offset>" txn-id fields.
+  "r<rev>/<item_index>" txn-id fields.
 
   In Format 3 and above, this uniqueness is done by changing a temporary
   id of "_<base36>" to "<base36>-<rev>".  Note that this means that the
@@ -429,13 +510,13 @@ A revision file contains a concatenation
   * Text and property representations
   * Node-revisions
   * The changed-path data
-  * Two offsets at the very end
+  * Two offsets at the very end (physical addressing mode only)
 
 A representation begins with a line containing either "PLAIN\n" or
-"DELTA\n" or "DELTA <rev> <offset> <length>\n", where <rev>, <offset>,
-and <length> give the location of the delta base of the representation
-and the amount of data it contains (not counting the header or
-trailer).  If no base location is given for a delta, the base is the
+"DELTA\n" or "DELTA <rev> <item_index> <length>\n", where <rev>,
+<item_index>, and <length> give the location of the delta base of the
+representation and the amount of data it contains (not counting the header
+or trailer).  If no base location is given for a delta, the base is the
 empty stream.  After the initial line comes raw svndiff data, followed
 by a cosmetic trailer "ENDREP\n".
 
@@ -459,9 +540,9 @@ defined:
   type      "file" or "dir"
   pred      The ID of the predecessor node-rev
   count     Count of node-revs since the base of the node
-  text      "<rev> <offset> <length> <size> <digest>" for text
rep
-  props     "<rev> <offset> <length> <size> <digest>" for props
rep
-            <rev> and <offset> give location of rep
+  text      "<rev> <item_index> <length> <size> <digest>" for
text rep
+  props     "<rev> <item_index> <length> <size> <digest>" for
props rep
+            <rev> and <item_index> give location of rep
             <length> gives length of rep, sans header and trailer
             <size> gives size of expanded rep; may be 0 if equal
              to the length
@@ -489,7 +570,7 @@ of the copy; it may be omitted if the no
 of revision 0).  Copy roots are identified by revision and
 created-path, not by node-rev ID, because a copy root may be a
 node-rev which exists later on within the same revision file, meaning
-its offset is not yet known.
+its location is not yet known.
 
 The changed-path data is represented as a series of changed-path
 items, each consisting of two lines.  The first line has the format
@@ -507,10 +588,10 @@ Starting with FS format 4, <action> may 
 "dir") of the node, after a hyphen; for example, an added directory
 may be represented as "add-dir".
 
-At the very end of a rev file is a pair of lines containing
-"\n<root-offset> <cp-offset>\n", where <root-offset> is the offset of
-the root directory node revision and <cp-offset> is the offset of the
-changed-path data.
+In physical addressing mode, at the very end of a rev file is a pair of
+lines containing "\n<root-offset> <cp-offset>\n", where <root-offset> is
+the offset of the root directory node revision and <cp-offset> is the
+offset of the changed-path data.
 
 All numbers in the rev file format are unsigned and are represented as
 ASCII decimal.
@@ -533,6 +614,12 @@ In FS formats 1 and 2, it also contains:
   rev                        Prototype rev file with new text reps
   rev-lock                   Lockfile for writing to the above
 
+In format 7+ logical addressing mode, it contains two more files
+(see structure-indexes for a detailed description):
+
+  index.l2p                  Log-to-phys proto-index
+  index.p2l                  Phys-to-log proto-index
+
 In newer formats, these files are in the txn-protorevs/ directory.
 
 The prototype rev file is used to store the text representations as
@@ -545,7 +632,7 @@ file will always be present.  The "node.
 only be present if the node-rev properties have been changed.
 
 The <sha1> files have been introduced in FS format 6. Their content
-is that of text rep references: "<rev> <offset> <length> <size> <digest>"
+is that of text rep references: "<rev> <item_offset> <length> <size>
<digest>"
 They will be written for text reps in the current transaction and be
 used to eliminate duplicate reps within that transaction.
 
@@ -619,3 +706,14 @@ reference the same path as above, but lo
 that file (instead of lock information).  Children are listed as MD5
 digests, too, so you would simply iterate over those digests and
 consult the files they reference for lock information.
+
+
+Index files
+-----------
+
+Format 7 introduces logical addressing that requires item indexes
+to be translated / mapped to physical rev / pack file offsets.
+
+Details of the binary format used by these index files can be
+found in structure-indexes.
+

Added: subversion/branches/log-addressing/subversion/libsvn_fs_fs/structure-indexes
URL: http://svn.apache.org/viewvc/subversion/branches/log-addressing/subversion/libsvn_fs_fs/structure-indexes?rev=1510990&view=auto
==============================================================================
--- subversion/branches/log-addressing/subversion/libsvn_fs_fs/structure-indexes (added)
+++ subversion/branches/log-addressing/subversion/libsvn_fs_fs/structure-indexes Tue Aug 
6 15:14:14 2013
@@ -0,0 +1,301 @@
+This file describes the design, data model, and file formats of FSFS
+index files.
+
+
+Design
+======
+
+For each pack and each rev file using logical addressing, there is exactly
+two index files.  One, the log-to-phys index, maps the (rev, item_index)
+pairs to absolute file offsets.  The other, phys-to-log, is a reverse
+index that gives basic information on any file location.  This is enough
+to read and cache any data without traversing DAGs.
+
+Rev and pack files are immutable, so the same is true for index files.
+During a transaction or while packing a file, a proto index file gets
+written (actually, one log-to-phys and one phys-to-log).  Its format is
+a simple concatenation of runtime structs and as such, an implementation
+detail subject to change.  A proto index basically aggregates all the
+information that must later be transformed into the final index.
+
+
+General design concerns
+-----------------------
+
+In Subversion, there is no limit to the size of a revision; even practical
+limits are in the order of millions of changes at least.  Index data for
+these would be multiple megabytes in size with pack file indexes possibly
+approaching 1 GB.  To ensure we still get roughly O(1) access time, we
+need a hierarchical data structure.
+
+Therefore, the indexes will start with a header containing an array of
+references to sub-sections or pages.  The length of these pages varies
+but is limited to a size configurable in fsfs.conf.
+
+Finally, it is assumed that whole pages can be cached efficiently and
+with a high cache hit rate.  So, although a page may have a thousand or
+more entries, the access time is still amortized O(1) in many scenarios.
+
+
+Items and item types
+--------------------
+
+The index implementation treats item_index and item type as simple ints,
+except for SVN_FS_FS__ITEM_INDEX_UNUSED and SVN_FS_FS__ITEM_TYPE_UNUSED.
+Since they have been defined as 0, the code may test for "used" etc.
+by simply comparing with 0.
+
+See section "addressing modes" in structure to a list of item types
+and pre-defined item_index values.
+
+
+Encoding
+--------
+
+The final index file format is tuned for space and decoding efficiency.
+Indexes are stored as a sequence of variable integers.  The encoding is
+as follows:
+
+* Unsigned integers are stored in little endian order with a variable
+  length 7b/8b encoding.  If most significant bit a byte has been set,
+  the next byte has also belongs to the same value.
+
+  0x00 .. 0x7f    -> 0x00 .. 0x7f               ( 7 bits stored in  8 bits)
+  0x80 .. 0xff    -> 0x80 0x01 .. 0xff 0x01     (14 bits stored in 16 bits)
+  0x100 .. 0x3fff -> 0x80 0x02 .. 0xff 0x7f     (14 bits stored in 16 bits)
+  0x100000000     -> 0x80 0x80 0x80 0x80 0x10   (35 bits stored in 40 bits)
+
+  Technically, we can represent integers of arbitrary lengths.  Currently,
+  we only generate and parse up to 64 bits only.
+
+* Signed integers are mapped onto the unsigned value space as follows:
+
+  x >= 0 ->  2 * x
+  x < 0  -> -2 * x - 1
+
+  Again, we can represent arbitrary length numbers that way but the code
+  is currently restricted to 64 bits.
+
+Most data is unsigned by nature but will be stored differentially using
+signed integers.
+
+
+Log-to-phys index
+=================
+
+This index has to map (rev, item_index) -> offset.  It assumes that the
+item_index values per revision are dense and start at 0.  There may be
+unused item_index values, though; the data structure simply gets less
+space-efficient when the more sparse the value space gets.
+
+
+Index data model
+----------------
+
+hierarchy:
+
+  header -> per-revision info -> page -> offset
+
+  There is one entry per revision in the header.  Per revision there are
+  one or more pages (exclusive to that revision) containing up to a known,
+  fixed limit of entries (= page size).  The total access path is:
+
+  pages = header->pages[revision];
+  offsets = page = pages[item_index / page_size];
+  offset = offsets[item_index % page_size];
+
+  Different log-to-phys indexes in the same repository may have different
+  page sizes but within any given index file, the page size is the same
+  and immutable.
+
+header:
+
+  <first revision>   ... first revision covered by this index file
+  <revision count>   ... number of revision covered by this index file
+  <page size>        ... maximum number of entries per page
+  <page table index> ... array, for each revision containing the index in
+                         <page table> of the first of page that belongs to
+                         this revision.  This has <revision count>+1
+                         entries to terminate the last revision.
+  <page table>       ... array of page headers.  It has
+                         <page table index>[<revision count>] entries.
+
+page table:
+
+  <offset>           ... absolute position of the page contents within the
+                         index file
+  <entry count>      ... number of offset entries in the page.
+                         Must match <header>.<page size> unless this is
+                         the last page for the respective revision.
+  <size>             ... length in bytes of the on-disk page description.
+                         Note that this is redundant with the <offset>.
+
+page:
+
+  <entry count>      ... number of offset entries in the page.
+                         Must match <header>.<page size> unless this is
+                         the last page for the respective revision.
+                         Redundant with <page table>.<entry count>
+  <offsets>          ... array of absolute file positions within the rev /
+                         pack file.  This has <entry count> entries.
+
+                         
+Index file format
+-----------------
+
+  file := header revisions pages offsets
+
+  header := u(<header>.<first revision>) \
+            u(<header>.<page size>) \
+            u(s(<header>.<page table index>)) \
+            u(s(<header>.<page table>))
+
+  revisions := u(  <header>.<page table index>[k+1]
+                 - <header>.<page table index>[k]),
+               for k in 0 .. s(<header>.<page table index>)-1
+
+  pages := u(<header>.<page table>[k].<size>) \
+           u(<header>.<page table>[k].<entry count>),
+           for k in 0 .. s(<header>.<page table>)-1
+
+  offsets := i(<header>.<page table>[k].<offsets>[0]) \
+             i(  <header>.<page table>[k].<offsets>[l] \
+               - <header>.<page table>[k].<offsets>[l - 1]),
+             for l in 1 .. s(<header>.<page table>[k].<entry count>)-1,
+             for k in 0 .. s(<header>.<page table>)-1
+
+  u(x) ... unsigned int x in 7b/8b encoding
+  i(x) ... signed int x in 7b/8b encoding
+  s(x) ... number of entries in array x
+
+
+Proto index file format
+-----------------------
+
+The index will be created from a "revision-less" proto index file
+containing (<offset><item_index>) pairs only.
+
+All mappings belonging to the same revision must be written in one go but
+there is no restriction on the order of those entries.  To signify that
+a new revision begins, a (0, 0) mapping must be written.  A (0, 0) entry
+at the beginning of the file is optional and will be ignored.
+
+  <bof>         /* begin of proto index file for revision r and following */
+  (0, 0)        /* mark start of revision r, optional for first rev */
+  (off, item)*  /* zero to many mappings in random order */
+  (0, 0)        /* mark start of revision r + 1 */
+  (off, item)*  /* zero to many mappings in random order */
+  (0, 0)        /* mark start of revision r + 2 */
+  (off, item)*  /* zero to many mappings in random order */
+  ...
+  <eof>         /* end of file. */
+
+All entries are pairs of 64 bit unsigned integers in machine endianess.
+
+
+Phys-to-log index
+=================
+
+This index has to map offset -> (rev, item_index, type, len).
+
+
+Index data model
+----------------
+
+hierarchy:
+
+  header -> page -> item info
+
+  Logically, the index splits up index rev / pack file into pages of a
+  fixed size.  That page size may differ from the FS's block size.  The
+  index will describe each rev / pack file page with one index page.
+
+  page = header->pages[offset % page_size];
+  item info = binary_search(page.data, offset)
+
+  Note that the current implementation will not return any data if the
+  offset is does not match any actual item start.  To simplify the lookup,
+  the last index page will have an "unused item" entry for the section
+  behind EOF.  Holes aren't allowed as well, i.e. every byte of the rev /
+  pack is expected to be covered by the index file.
+
+  Also, there may be items stretching across page borders or even over
+  multiple pages.  The data model solves this issue by storing the item
+  descriptions as a "primary array" and then representing the pages as
+  ranges within that array.  Thus multiple pages may reference the same
+  item description.
+
+header:
+
+  <first revision>   ... first revision covered by this index file
+  <page size>        ... number of bytes in the rev / pack file covered by
+                         each index page
+  <page count>       ... number of pages
+  <offsets>          ... array of page offsets, i.e. locations the page
+                         data within the index.
+                         This array has <page count> + 1 entries.
+
+page:
+
+  <entries>          ... array of item descriptions, ordered by offset.
+                         First and last entry may cross page boundaries.
+
+entry:
+
+  <offset>           ... absolute position in the pack / rev file
+  <size>             ... on-disk size of the item in the pack / rev file
+  <type>             ... item type
+  <revision>         ... revision that this item belongs to
+  <item_index>       ... item_index within that revision
+
+
+Index file format
+-----------------
+
+  file := header pages items
+
+  header := u(<header>.<first revision>) \
+            u(<header>.<page size>) \
+            u(<header>.<page count>)
+
+  pages := u(<header>.<offsets>[k+1] - <header>.<offsets>[k]),
+           for k in 0 .. <header>.<page count> -1
+
+  items := u(<items in page k>[0].<offset>)
+           u(<items in page k>[l].<size>) \
+           i(c(<items in page k>[l]) - c(<items of page k>[l-1])) \
+           i(  <items in page k>[l].<revision>
+             - <items in page k>[l-1].<revision>),
+           for l in 0 .. s(<items in page k>)-1,
+           for k in 0 .. <header>.<page count>-1
+
+  u(x) ... unsigned int x in 7b/8b encoding
+  i(x) ... signed int x in 7b/8b encoding
+  s(x) ... number of entries in collection x
+  c(x) ... compound function := x.<item_index> * 8 + x.<type>
+
+  Access to negative indexes gives a 0 value.
+
+  <Items in page k> are in strict ascending offset order.  Items that
+  started after the begin of a given page and overlap with the next page
+  will not be stored in the start page.  The runtime representation will
+  duplicate items overlapping page boundaries; the on-disk representation
+  will not.
+
+
+Proto index file format
+-----------------------
+
+The index will be created from a proto index file containing simple
+instances of the in-memory representation of svn_fs_fs__p2l_entry_t.
+Page table and header information, except start revision and page size,
+can easily be derived from that information.
+
+All entries must be written in strict offset order.  Overlapping entries
+are not allowed; zero-length items are.
+
+In transactions, the final revision number may not be known when writing
+the proto index file (e.g. while still writing the proto rev file).  Items
+with revision set to SVN_INVALID_REVNUM will therefore be automatically
+updated when creating the index file.  This is not possible in conjunction
+with rev files but not for pack files.



Mime
View raw message