hbase-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dm...@apache.org
Subject svn commit: r1181091 - in /hbase/trunk/src/docbkx: book.xml ops_mgt.xml
Date Mon, 10 Oct 2011 17:41:54 GMT
Author: dmeil
Date: Mon Oct 10 17:41:53 2011
New Revision: 1181091

URL: http://svn.apache.org/viewvc?rev=1181091&view=rev
HBASE-4566 book.xml,ops_mgt.xml - KeyValue documentation


Modified: hbase/trunk/src/docbkx/book.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/book.xml?rev=1181091&r1=1181090&r2=1181091&view=diff
--- hbase/trunk/src/docbkx/book.xml (original)
+++ hbase/trunk/src/docbkx/book.xml Mon Oct 10 17:41:53 2011
@@ -312,7 +312,7 @@ public static class MyReducer extends Ta
       <para>A good general introduction on the strength and weaknesses modelling on
           the various non-rdbms datastores is Ian Varleys' Master thesis,
           <link xlink:href="http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf">No
Relation: The Mixed Blessings of Non-Relational Databases</link>.
-          Recommended.
+          Recommended.  Also, read <xref linkend="keyvalue"/> for how HBase stores
data internally.
   <section xml:id="schema.creation">
@@ -400,7 +400,7 @@ admin.enableTable(table);               
        <para>Most of the time small inefficiencies don't matter all that much.  Unfortunately,
          this is a case where they do.  Whatever patterns are selected for ColumnFamilies,
attributes, and rowkeys they could be repeated
-       several billion times in your data</para>
+       several billion times in your data.  See <xref linkend="keyvalue"/> for more
information on HBase stores data internally.</para>
        <section xml:id="keysize.cf"><title>Column Families</title>
          <para>Try to keep the ColumnFamily names as small as possible, preferably
one character (e.g. "d" for data/default).
@@ -1615,6 +1615,8 @@ scan.setFilter(filter);
               Schubert Zhang's blog post on <link xlink:ref="http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html">HFile:
A Block-Indexed File Format to Store Sorted Key-Value Pairs</link> makes for a thorough
introduction to HBase's hfile.  Matteo Bertozzi has also put up a
               helpful description, <link xlink:href="http://th30z.blogspot.com/2011/02/hbase-io-hfile.html?spref=tw">HBase
I/O: HFile</link>.
+          <para>For more information, see the <link xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/HFile.html">HFile
source code</link>.
+          </para>
       <section xml:id="hfile_tool">
@@ -1631,6 +1633,40 @@ scan.setFilter(filter);
+      <section xml:id="hfile.blocks">
+        <title>Blocks</title>
+        <para>StoreFiles are composed of blocks.  The blocksize is configured on a
per-ColumnFamily basis.
+        </para>
+        <para>For more information, see the <link xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/HFileBlock.html">HFileBlock
source code</link>.
+        </para>
+      </section>
+      <section xml:id="keyvalue">
+        <title>KeyValue</title>
+        <para>The KeyValue class is the heart of data storage in HBase.  KeyValue wraps
a byte array and takes offsets and lengths into passed array
+         at where to start interpreting the content as KeyValue.
+        </para>
+        <para>The KeyValue format inside a byte array is:
+           <itemizedlist>
+             <listitem>keylength</listitem>
+             <listitem>valuelength</listitem>
+             <listitem>key</listitem>
+             <listitem>value</listitem>
+           </itemizedlist>
+        </para>
+        <para>The Key is further decomposed as:
+           <itemizedlist>
+             <listitem>rowlength</listitem>
+             <listitem>row (i.e., the rowkey)</listitem>
+             <listitem>columnfamilylength</listitem>
+             <listitem>columnfamily</listitem>
+             <listitem>columnqualifier</listitem>
+             <listitem>timestamp</listitem>
+             <listitem>keytype (e.g., Put, Delete)</listitem>
+           </itemizedlist>
+        </para>
+        <para>For more information, see the <link xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/KeyValue.html">KeyValue
source code</link>.
+        </para>
+      </section>
       <section xml:id="compaction">
         <para>There are two types of compactions:  minor and major.  Minor compactions
will usually pick up a couple of the smaller adjacent

Modified: hbase/trunk/src/docbkx/ops_mgt.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/ops_mgt.xml?rev=1181091&r1=1181090&r2=1181091&view=diff
--- hbase/trunk/src/docbkx/ops_mgt.xml (original)
+++ hbase/trunk/src/docbkx/ops_mgt.xml Mon Oct 10 17:41:53 2011
@@ -301,6 +301,32 @@ false
       <para>Since the cluster is up, there is a risk that edits could be missed in
the export process.
+  </section>  <!--  backup -->
+  <section xml:id="ops.capacity"><title>Capacity Planning</title>
+    <section xml:id="ops.capacity.storage"><title>Storage</title>
+      <para>A common question for HBase administrators is estimating how much storage
will be required for an HBase cluster.
+      There are several apsects to consider, the most important of which is what data load
into the cluster.  Start
+      with a solid understanding of how HBase handles data internally (KeyValue).
+      </para>
+      <section xml:id="ops.capacity.storage.kv"><title>KeyValue</title>
+        <para>HBase storage will be dominated by KeyValues.  See <xref linkend="keyvalue"
/> and <xref linkend="keysize" /> for 
+        how HBase stores data internally.  
+        </para>
+        <para>It is critical to understand that there is a KeyValue instance for every
attribute stored in a row, and the 
+        rowkey-length, ColumnFamily name-length and attribute lengths will drive the size
of the database more than any other
+        factor.
+        </para>
+      </section>
+      <section xml:id="ops.capacity.storage.sf"><title>StoreFiles and Blocks</title>
+        <para>KeyValue instances are aggregated into blocks, and the blocksize is configurable
on a per-ColumnFamily basis.
+        Blocks are aggregated into StoreFile's.  See <xref linkend="regions.arch" />.
+        </para>
+      </section>
+      <section xml:id="ops.capacity.storage.hdfs"><title>HDFS Block Replication</title>
+        <para>Because HBase runs on top of HDFS, factor in HDFS block replication into
storage calculations.
+        </para>
+      </section>
+    </section>

View raw message