zookeeper-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From f..@apache.org
Subject svn commit: r1335947 - in /zookeeper/bookkeeper/trunk: CHANGES.txt doc/bookieConfigParams.textile doc/bookkeeperOverview.textile
Date Wed, 09 May 2012 07:16:05 GMT
Author: fpj
Date: Wed May  9 07:16:05 2012
New Revision: 1335947

URL: http://svn.apache.org/viewvc?rev=1335947&view=rev
Log:
BOOKKEEPER-241: Add documentation for bookie entry log compaction (sijie via fpj)


Modified:
    zookeeper/bookkeeper/trunk/CHANGES.txt
    zookeeper/bookkeeper/trunk/doc/bookieConfigParams.textile
    zookeeper/bookkeeper/trunk/doc/bookkeeperOverview.textile

Modified: zookeeper/bookkeeper/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/zookeeper/bookkeeper/trunk/CHANGES.txt?rev=1335947&r1=1335946&r2=1335947&view=diff
==============================================================================
--- zookeeper/bookkeeper/trunk/CHANGES.txt (original)
+++ zookeeper/bookkeeper/trunk/CHANGES.txt Wed May  9 07:16:05 2012
@@ -150,6 +150,8 @@ Trunk (unreleased changes)
 
 	BOOKKEEPER-173: Uncontrolled number of threads in bookkeeper (sijie via fpj)
 
+	BOOKKEEPER-241: Add documentation for bookie entry log compaction (sijie via fpj)
+
       hedwig-server/
 
         BOOKKEEPER-77: Add a console client for hedwig (Sijie Guo via ivank)

Modified: zookeeper/bookkeeper/trunk/doc/bookieConfigParams.textile
URL: http://svn.apache.org/viewvc/zookeeper/bookkeeper/trunk/doc/bookieConfigParams.textile?rev=1335947&r1=1335946&r2=1335947&view=diff
==============================================================================
--- zookeeper/bookkeeper/trunk/doc/bookieConfigParams.textile (original)
+++ zookeeper/bookkeeper/trunk/doc/bookieConfigParams.textile Wed May  9 07:16:05 2012
@@ -43,4 +43,9 @@ h3. Ledger manager settings
 | @ledgerManagerType@ | What kind of ledger manager is used to manage how ledgers are stored,
managed and garbage collected. See "BookKeeper Internals":./bookkeeperInternals.html for detailed
info. Default is flat. |
 | @zkLedgersRootPath@ | Root zookeeper path to store ledger metadata. Default is /ledgers.
|
 
+h3. Entry Log compaction settings
 
+| @minorCompactionInterval@ | Interval to run minor compaction, in seconds. If it is set
to less than or equal to zero, then minor compaction is disabled. Default is 1 hour. |
+| @minorCompactionThreshold@ | Entry log files with remaining size under this threshold value
will be compacted in a minor compaction. If it is set to less than or equal to zero, the minor
compaction is disabled. Default is 0.2 |
+| @majorCompactionInterval@ | Interval to run major compaction, in seconds. If it is set
to less than or equal to zero, then major compaction is disabled. Default is 1 day. |
+| @majorCompactionThreshold@ | Entry log files with remaining size below this threshold value
will be compacted in a major compaction. Those entry log files whose remaining size percentage
is still higher than the threshold value will never be compacted. If it is set to less than
or equal to zero, the major compaction is disabled. Default is 0.8. |

Modified: zookeeper/bookkeeper/trunk/doc/bookkeeperOverview.textile
URL: http://svn.apache.org/viewvc/zookeeper/bookkeeper/trunk/doc/bookkeeperOverview.textile?rev=1335947&r1=1335946&r2=1335947&view=diff
==============================================================================
--- zookeeper/bookkeeper/trunk/doc/bookkeeperOverview.textile (original)
+++ zookeeper/bookkeeper/trunk/doc/bookkeeperOverview.textile Wed May  9 07:16:05 2012
@@ -165,3 +165,21 @@ p. Using the above data flush mechanism,
 p. As described above, _EntryLogger#flush_ is invoked in the following two cases:
 * in _Sync Thread_ : used to ensure entries added before _LastLogMark_ are persisted to disk.
 * in _ShutDown_ : used to ensure its buffered data persisted to disk to avoid data corruption
with partial entries.
+
+h2. Data Compaction
+
+p. In bookie server, entries of different ledgers are interleaved in entry log files. A bookie
server runs a _Garbage Collector_ thread to delete un-associated entry log files to reclaim
disk space. If a given entry log file contains entries from a ledger that has not been deleted,
then the entry log file would never be removed and the occupied disk space never reclaimed.
In order to avoid such a case, a bookie server compacts entry log files in _Garbage Collector_
thread to reclaim disk space.
+
+p. There are two kinds of compaction running with different frequency, which are _Minor Compaction_
and _Major Compaction_. The differences of _Minor Compaction_ and _Major Compaction_ are just
their threshold value and compaction interval.
+
+# _Threshold_ : Size percentage of an entry log file occupied by those undeleted ledgers.
Default minor compaction threshold is 0.2, while major compaction threshold is 0.8.
+# _Interval_ : How long to run the compaction. Default minor compaction is 1 hour, while
major compaction threshold is 1 day.
+
+p. NOTE: if either _Threshold_ or _Interval_ is set to less than or equal to zero, then compaction
is disabled.
+
+p. The data compaction flow in _Garbage Collector Thread_ is as follows:
+
+# _Garbage Collector_ thread scans entry log files to get their entry log metadata, which
records a list of ledgers comprising an entry log and their corresponding percentages.
+# With the normal garbage collection flow, once the bookie determines that a ledger has been
deleted, the ledger will be removed from the entry log metadata and the size of the entry
log reduced.
+# If the remaining size of an entry log file reaches a specified threshold, the entries of
active ledgers in the entry log will be copied to a new entry log file.
+# Once all valid entries have been copied, the old entry log file is deleted.



Mime
View raw message