hbase-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From st...@apache.org
Subject svn commit: r1204039 [2/3] - in /hbase/branches/0.92/src/docbkx: book.xml configuration.xml developer.xml ops_mgt.xml performance.xml troubleshooting.xml
Date Sat, 19 Nov 2011 18:43:38 GMT

Modified: hbase/branches/0.92/src/docbkx/book.xml
URL: http://svn.apache.org/viewvc/hbase/branches/0.92/src/docbkx/book.xml?rev=1204039&r1=1204038&r2=1204039&view=diff
==============================================================================
--- hbase/branches/0.92/src/docbkx/book.xml (original)
+++ hbase/branches/0.92/src/docbkx/book.xml Sat Nov 19 18:43:37 2011
@@ -49,21 +49,11 @@
     <revhistory>
       <revision>
         <date />
-        <revdescription>Adding first cuts at Configuration, Getting Started, Data Model</revdescription>
+        <revdescription>HBase version</revdescription>
         <revnumber>
           <?eval ${project.version}?>
         </revnumber>
       </revision>
-      <revision>
-        <date>
-        5 October 2010
-        </date>
-        <authorinitials>stack</authorinitials>
-        <revdescription>Initial layout</revdescription>
-        <revnumber>
-          0.89.20100924
-        </revnumber>
-      </revision>
     </revhistory>
   </info>
 
@@ -74,868 +64,278 @@
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="upgrading.xml" />
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="shell.xml" />
 
+  <chapter xml:id="datamodel">
+    <title>Data Model</title>
+    <para>In short, applications store data into an HBase table.
+        Tables are made of rows and columns.
+      All columns in HBase belong to a particular column family.
+      Table cells -- the intersection of row and column
+      coordinates -- are versioned.
+      A cell’s content is an uninterpreted array of bytes.
+  </para>
+      <para>Table row keys are also byte arrays so almost anything can
+      serve as a row key from strings to binary representations of longs or
+      even serialized data structures. Rows in HBase tables
+      are sorted by row key. The sort is byte-ordered. All table accesses are
+      via the table row key -- its primary key.
+</para>
 
+    <section xml:id="conceptual.view"><title>Conceptual View</title>
+	<para>
+        The following example is a slightly modified form of the one on page
+        2 of the <link xlink:href="http://labs.google.com/papers/bigtable.html">BigTable</link> paper.
+    There is a table called <varname>webtable</varname> that contains two column families named
+    <varname>contents</varname> and <varname>anchor</varname>.
+    In this example, <varname>anchor</varname> contains two
+    columns (<varname>anchor:cssnsi.com</varname>, <varname>anchor:my.look.ca</varname>)
+    and <varname>contents</varname> contains one column (<varname>contents:html</varname>).
+    <note>
+        <title>Column Names</title>
+      <para>
+      By convention, a column name is made of its column family prefix and a
+      <emphasis>qualifier</emphasis>. For example, the
+      column
+      <emphasis>contents:html</emphasis> is of the column family <varname>contents</varname>
+          The colon character (<literal
+          moreinfo="none">:</literal>) delimits the column family from the
+          column family <emphasis>qualifier</emphasis>.
+    </para>
+    </note>
+    <table frame='all'><title>Table <varname>webtable</varname></title>
+	<tgroup cols='4' align='left' colsep='1' rowsep='1'>
+	<colspec colname='c1'/>
+	<colspec colname='c2'/>
+	<colspec colname='c3'/>
+	<colspec colname='c4'/>
+	<thead>
+        <row><entry>Row Key</entry><entry>Time Stamp</entry><entry>ColumnFamily <varname>contents</varname></entry><entry>ColumnFamily <varname>anchor</varname></entry></row>
+	</thead>
+	<tbody>
+        <row><entry>"com.cnn.www"</entry><entry>t9</entry><entry></entry><entry><varname>anchor:cnnsi.com</varname> = "CNN"</entry></row>
+        <row><entry>"com.cnn.www"</entry><entry>t8</entry><entry></entry><entry><varname>anchor:my.look.ca</varname> = "CNN.com"</entry></row>
+        <row><entry>"com.cnn.www"</entry><entry>t6</entry><entry><varname>contents:html</varname> = "&lt;html&gt;..."</entry><entry></entry></row>
+        <row><entry>"com.cnn.www"</entry><entry>t5</entry><entry><varname>contents:html</varname> = "&lt;html&gt;..."</entry><entry></entry></row>
+        <row><entry>"com.cnn.www"</entry><entry>t3</entry><entry><varname>contents:html</varname> = "&lt;html&gt;..."</entry><entry></entry></row>
+	</tbody>
+	</tgroup>
+	</table>
+	</para>
+	</section>
+    <section xml:id="physical.view"><title>Physical View</title>
+	<para>
+        Although at a conceptual level tables may be viewed as a sparse set of rows.
+        Physically they are stored on a per-column family basis.  New columns
+        (i.e., <varname>columnfamily:column</varname>) can be added to any
+        column family without pre-announcing them. 
+        <table frame='all'><title>ColumnFamily <varname>anchor</varname></title>
+	<tgroup cols='3' align='left' colsep='1' rowsep='1'>
+	<colspec colname='c1'/>
+	<colspec colname='c2'/>
+	<colspec colname='c3'/>
+	<thead>
+        <row><entry>Row Key</entry><entry>Time Stamp</entry><entry>Column Family <varname>anchor</varname></entry></row>
+	</thead>
+	<tbody>
+        <row><entry>"com.cnn.www"</entry><entry>t9</entry><entry><varname>anchor:cnnsi.com</varname> = "CNN"</entry></row>
+        <row><entry>"com.cnn.www"</entry><entry>t8</entry><entry><varname>anchor:my.look.ca</varname> = "CNN.com"</entry></row>
+	</tbody>
+	</tgroup>
+	</table>
+    <table frame='all'><title>ColumnFamily <varname>contents</varname></title>
+	<tgroup cols='3' align='left' colsep='1' rowsep='1'>
+	<colspec colname='c1'/>
+	<colspec colname='c2'/>
+	<colspec colname='c3'/>
+	<thead>
+	<row><entry>Row Key</entry><entry>Time Stamp</entry><entry>ColumnFamily "contents:"</entry></row>
+	</thead>
+	<tbody>
+        <row><entry>"com.cnn.www"</entry><entry>t6</entry><entry><varname>contents:html</varname> = "&lt;html&gt;..."</entry></row>
+        <row><entry>"com.cnn.www"</entry><entry>t5</entry><entry><varname>contents:html</varname> = "&lt;html&gt;..."</entry></row>
+        <row><entry>"com.cnn.www"</entry><entry>t3</entry><entry><varname>contents:html</varname> = "&lt;html&gt;..."</entry></row>
+	</tbody>
+	</tgroup>
+	</table>
+    It is important to note in the diagram above that the empty cells shown in the
+    conceptual view are not stored since they need not be in a column-oriented
+    storage format. Thus a request for the value of the <varname>contents:html</varname>
+    column at time stamp <literal>t8</literal> would return no value. Similarly, a
+    request for an <varname>anchor:my.look.ca</varname> value at time stamp
+    <literal>t9</literal> would return no value.  However, if no timestamp is
+    supplied, the most recent value for a particular column would be returned
+    and would also be the first one found since timestamps are stored in
+    descending order. Thus a request for the values of all columns in the row
+    <varname>com.cnn.www</varname> if no timestamp is specified would be:
+    the value of <varname>contents:html</varname> from time stamp
+    <literal>t6</literal>, the value of <varname>anchor:cnnsi.com</varname>
+    from time stamp <literal>t9</literal>, the value of 
+    <varname>anchor:my.look.ca</varname> from time stamp <literal>t8</literal>.
+	</para>
+	</section>
 
-  <chapter xml:id="mapreduce">
-  <title>HBase and MapReduce</title>
-  <para>See <link xlink:href="http://hbase.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description">HBase and MapReduce</link> up in javadocs.
-  Start there.  Below is some additional help.</para>
-  <section xml:id="splitter">
-    <title>Map-Task Spitting</title>
-    <section xml:id="splitter.default">
-    <title>The Default HBase MapReduce Splitter</title>
-    <para>When <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html">TableInputFormat</link>
-    is used to source an HBase table in a MapReduce job,
-    its splitter will make a map task for each region of the table.
-    Thus, if there are 100 regions in the table, there will be
-    100 map-tasks for the job - regardless of how many column families are selected in the Scan.</para>
+    <section xml:id="table">
+      <title>Table</title>
+      <para>
+      Tables are declared up front at schema definition time.
+      </para>
     </section>
-    <section xml:id="splitter.custom">
-    <title>Custom Splitters</title>
-    <para>For those interested in implementing custom splitters, see the method <code>getSplits</code> in 
-    <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.html">TableInputFormatBase</link>.
-    That is where the logic for map-task assignment resides.  
-    </para>
+
+    <section xml:id="row">
+      <title>Row</title>
+      <para>Row keys are uninterrpreted bytes. Rows are
+      lexicographically sorted with the lowest order appearing first
+      in a table.  The empty byte array is used to denote both the
+      start and end of a tables' namespace.</para>
     </section>
-  </section>
-  <section xml:id="mapreduce.example">
-    <title>HBase MapReduce Examples</title>
-    <section xml:id="mapreduce.example.read">
-    <title>HBase MapReduce Read Example</title>
-    <para>The following is an example of using HBase as a MapReduce source in read-only manner.  Specifically,
-    there is a Mapper instance but no Reducer, and nothing is being emitted from the Mapper.  There job would be defined
-    as follows...
-	<programlisting>
-Configuration config = HBaseConfiguration.create();
-Job job = new Job(config, "ExampleRead");
-job.setJarByClass(MyReadJob.class);     // class that contains mapper
-	
-Scan scan = new Scan();
-scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
-scan.setCacheBlocks(false);  // don't set to true for MR jobs
-// set other scan attrs
-...
-  
-TableMapReduceUtil.initTableMapperJob(
-  tableName,        // input HBase table name
-  scan,             // Scan instance to control CF and attribute selection
-  MyMapper.class,   // mapper
-  null,             // mapper output key 
-  null,             // mapper output value
-  job);
-job.setOutputFormatClass(NullOutputFormat.class);   // because we aren't emitting anything from mapper
-	    
-boolean b = job.waitForCompletion(true);
-if (!b) {
-  throw new IOException("error with job!");
-}
-  </programlisting>
-  ...and the mapper instance would extend <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapper.html">TableMapper</link>...
-	<programlisting>
-public static class MyMapper extends TableMapper&lt;Text, Text&gt; {
 
-  public void map(ImmutableBytesWritable row, Result value, Context context) throws InterruptedException, IOException {
-    // process data for the row from the Result instance.
-   }
-}    
-    </programlisting>
-  	  </para>
-  	 </section>
-    <section xml:id="mapreduce.example.readwrite">
-    <title>HBase MapReduce Read/Write Example</title>
-    <para>The following is an example of using HBase both as a source and as a sink with MapReduce. 
-    This example will simply copy data from one table to another.
-    <programlisting>
-Configuration config = HBaseConfiguration.create();
-Job job = new Job(config,"ExampleReadWrite");
-job.setJarByClass(MyReadWriteJob.class);    // class that contains mapper
-	        	        
-Scan scan = new Scan();
-scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
-scan.setCacheBlocks(false);  // don't set to true for MR jobs
-// set other scan attrs
-	        
-TableMapReduceUtil.initTableMapperJob(
-	sourceTable,      // input table
-	scan,	          // Scan instance to control CF and attribute selection
-	MyMapper.class,   // mapper class
-	null,	          // mapper output key
-	null,	          // mapper output value
-	job);
-TableMapReduceUtil.initTableReducerJob(
-	targetTable,      // output table
-	null,             // reducer class
-	job);
-job.setNumReduceTasks(0);
-	        
-boolean b = job.waitForCompletion(true);
-if (!b) {
-    throw new IOException("error with job!");
-}
-    </programlisting>
-	An explanation is required of what <classname>TableMapReduceUtil</classname> is doing, especially with the reducer.  
-	<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html">TableOutputFormat</link> is being used
-	as the outputFormat class, and several parameters are being set on the config (e.g., TableOutputFormat.OUTPUT_TABLE), as
-	well as setting the reducer output key to <classname>ImmutableBytesWritable</classname> and reducer value to <classname>Writable</classname>.
-	These could be set by the programmer on the job and conf, but <classname>TableMapReduceUtil</classname> tries to make things easier.    
-	<para>The following is the example mapper, which will create a <classname>Put</classname> and matching the input <classname>Result</classname>
-	and emit it.  Note:  this is what the CopyTable utility does.
-	</para>
-    <programlisting>
-public static class MyMapper extends TableMapper&lt;ImmutableBytesWritable, Put&gt;  {
+    <section xml:id="columnfamily">
+      <title>Column Family<indexterm><primary>Column Family</primary></indexterm></title>
+        <para>
+      Columns in HBase are grouped into <emphasis>column families</emphasis>.
+      All column members of a column family have the same prefix.  For example, the
+      columns <emphasis>courses:history</emphasis> and
+      <emphasis>courses:math</emphasis> are both members of the
+      <emphasis>courses</emphasis> column family.
+          The colon character (<literal
+          moreinfo="none">:</literal>) delimits the column family from the
+      <indexterm>column family <emphasis>qualifier</emphasis><primary>Column Family Qualifier</primary></indexterm>.
+        The column family prefix must be composed of
+      <emphasis>printable</emphasis> characters. The qualifying tail, the
+      column family <emphasis>qualifier</emphasis>, can be made of any
+      arbitrary bytes. Column families must be declared up front
+      at schema definition time whereas columns do not need to be
+      defined at schema time but can be conjured on the fly while
+      the table is up an running.</para>
+      <para>Physically, all column family members are stored together on the
+      filesystem.  Because tunings and
+      storage specifications are done at the column family level, it is
+      advised that all column family members have the same general access
+      pattern and size characteristics.</para>
 
-	public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
-		// this example is just copying the data from the source table...
-   		context.write(row, resultToPut(row,value));
-   	}
-        
-  	private static Put resultToPut(ImmutableBytesWritable key, Result result) throws IOException {
-  		Put put = new Put(key.get());
- 		for (KeyValue kv : result.raw()) {
-			put.add(kv);
-		}
-		return put;
-   	}
-}
-    </programlisting>
-    <para>There isn't actually a reducer step, so <classname>TableOutputFormat</classname> takes care of sending the <classname>Put</classname>
-    to the target table. 
-    </para>
-    <para>This is just an example, developers could choose not to use <classname>TableOutputFormat</classname> and connect to the 
-    target table themselves.
-    </para>
-    </para>
+      <para></para>
     </section>
-    <section xml:id="mapreduce.example.summary">
-    <title>HBase MapReduce Summary Example</title>
-    <para>The following example uses HBase as a MapReduce source and sink with a summarization step.  This example will 
-    count the number of distinct instances of a value in a table and write those summarized counts in another table.
-    <programlisting>
-Configuration config = HBaseConfiguration.create();
-Job job = new Job(config,"ExampleSummary");
-job.setJarByClass(MySummaryJob.class);     // class that contains mapper and reducer
-	        
+    <section xml:id="cells">
+      <title>Cells<indexterm><primary>Cells</primary></indexterm></title>
+      <para>A <emphasis>{row, column, version} </emphasis>tuple exactly
+      specifies a <literal>cell</literal> in HBase. 
+      Cell content is uninterrpreted bytes</para>
+    </section>
+    <section xml:id="data_model_operations">
+       <title>Data Model Operations</title>
+       <para>The four primary data model operations are Get, Put, Scan, and Delete.  Operations are applied via 
+       <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html">HTable</link> instances.
+       </para>
+      <section xml:id="get">
+        <title>Get</title>
+        <para><link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/Get.html">Get</link> returns
+        attributes for a specified row.  Gets are executed via 
+        <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#get%28org.apache.hadoop.hbase.client.Get%29">
+        HTable.get</link>.
+        </para>
+      </section>
+      <section xml:id="put">
+        <title>Put</title>
+        <para><link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/Put.html">Put</link> either 
+        adds new rows to a table (if the key is new) or can update existing rows (if the key already exists).  Puts are executed via
+        <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#put%28org.apache.hadoop.hbase.client.Put%29">
+        HTable.put</link> (writeBuffer) or <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#batch%28java.util.List%29">
+        HTable.batch</link> (non-writeBuffer).  
+        </para>
+      </section>
+      <section xml:id="scan">
+          <title>Scans</title>
+          <para><link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/Scan.html">Scan</link> allow
+          iteration over multiple rows for specified attributes.
+          </para>
+          <para>The following is an example of a 
+           on an HTable table instance.  Assume that a table is populated with rows with keys "row1", "row2", "row3", 
+           and then another set of rows with the keys "abc1", "abc2", and "abc3".  The following example shows how startRow and stopRow 
+           can be applied to a Scan instance to return the rows beginning with "row".        
+<programlisting>
+HTable htable = ...      // instantiate HTable
+    
 Scan scan = new Scan();
-scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
-scan.setCacheBlocks(false);  // don't set to true for MR jobs
-// set other scan attrs
-	        
-TableMapReduceUtil.initTableMapperJob(
-	sourceTable,        // input table
-	scan,               // Scan instance to control CF and attribute selection
-	MyMapper.class,     // mapper class
-	Text.class,         // mapper output key
-	IntWritable.class,  // mapper output value
-	job);
-TableMapReduceUtil.initTableReducerJob(
-	targetTable,        // output table
-	MyReducer.class,    // reducer class
-	job);
-job.setNumReduceTasks(1);   // at least one, adjust as required
-	    
-boolean b = job.waitForCompletion(true);
-if (!b) {
-	throw new IOException("error with job!");
-}    
-    </programlisting>
-    In this example mapper a column with a String-value is chosen as the value to summarize upon.  
-    This value is used as the key to emit from the mapper, and an <classname>IntWritable</classname> represents an instance counter.
-    <programlisting>
-public static class MyMapper extends TableMapper&lt;Text, IntWritable&gt;  {
-
-	private final IntWritable ONE = new IntWritable(1);
-   	private Text text = new Text();
-    	
-   	public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
-        	String val = new String(value.getValue(Bytes.toBytes("cf"), Bytes.toBytes("attr1")));
-          	text.set(val);     // we can only emit Writables...
-
-        	context.write(text, ONE);
-   	}
-}
-    </programlisting>
-    In the reducer, the "ones" are counted (just like any other MR example that does this), and then emits a <classname>Put</classname>.
-    <programlisting>
-public static class MyReducer extends TableReducer&lt;Text, IntWritable, ImmutableBytesWritable&gt;  {
-        
- 	public void reduce(Text key, Iterable&lt;IntWritable&gt; values, Context context) throws IOException, InterruptedException {
-    		int i = 0;
-    		for (IntWritable val : values) {
-    			i += val.get();
-    		}
-    		Put put = new Put(Bytes.toBytes(key.toString()));
-    		put.add(Bytes.toBytes("cf"), Bytes.toBytes("count"), Bytes.toBytes(i));
-
-    		context.write(null, put);
-   	}
+scan.addColumn(Bytes.toBytes("cf"),Bytes.toBytes("attr"));
+scan.setStartRow( Bytes.toBytes("row"));
+scan.setStopRow( Bytes.toBytes("row" +  new byte[] {0}));  // note: stop key != start key
+for(Result result : htable.getScanner(scan)) {
+  // process Result instance
 }
-    </programlisting>
-    
-    </para>
+</programlisting>
+         </para>        
+        </section>
+      <section xml:id="delete">
+        <title>Delete</title>
+        <para><link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/Delete.html">Delete</link> removes
+        a row from a table.  Deletes are executed via 
+        <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29">
+        HTable.delete</link>.
+        </para>
+      </section>
+            
     </section>
-   </section>
-   <section xml:id="mapreduce.htable.access">
-   <title>Accessing Other HBase Tables in a MapReduce Job</title>
-	<para>Although the framework currently allows one HBase table as input to a
-    MapReduce job, other HBase tables can 
-	be accessed as lookup tables, etc., in a
-    MapReduce job via creating an HTable instance in the setup method of the Mapper.
-	<programlisting>public class MyMapper extends TableMapper&lt;Text, LongWritable&gt; {
-  private HTable myOtherTable;
 
-  public void setup(Context context) {
-    myOtherTable = new HTable("myOtherTable");
-  }
-  
-  public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
-	// process Result...
-	// use 'myOtherTable' for lookups
-  }
-  
-  </programlisting>
-   </para>
-    </section>
-  <section xml:id="mapreduce.specex">
-  <title>Speculative Execution</title>
-  <para>It is generally advisable to turn off speculative execution for
-      MapReduce jobs that use HBase as a source.  This can either be done on a
-      per-Job basis through properties, on on the entire cluster.  Especially
-      for longer running jobs, speculative execution will create duplicate
-      map-tasks which will double-write your data to HBase; this is probably
-      not what you want.
-  </para>
-  </section>
-  </chapter>
 
-  <chapter xml:id="schema">
-  <title>HBase and Schema Design</title>
-      <para>A good general introduction on the strength and weaknesses modelling on
-          the various non-rdbms datastores is Ian Varleys' Master thesis,
-          <link xlink:href="http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf">No Relation: The Mixed Blessings of Non-Relational Databases</link>.
-          Recommended.
-      </para>
-  <section xml:id="schema.creation">
-  <title>
-      Schema Creation
-  </title>
-  <para>HBase schemas can be created or updated with <xref linkend="shell" />
-      or by using <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html">HBaseAdmin</link> in the Java API.
-      </para>
-      <para>Tables must be disabled when making ColumnFamily modifications, for example..
-      <programlisting>
-Configuration config = HBaseConfiguration.create();  
-HBaseAdmin admin = new HBaseAdmin(conf);    
-String table = "myTable";
+    <section xml:id="versions">
+      <title>Versions<indexterm><primary>Versions</primary></indexterm></title>
 
-admin.disableTable(table);           
+      <para>A <emphasis>{row, column, version} </emphasis>tuple exactly
+      specifies a <literal>cell</literal> in HBase. Its possible to have an
+      unbounded number of cells where the row and column are the same but the
+      cell address differs only in its version dimension.</para>
 
-HColumnDescriptor cf1 = ...;
-admin.addColumn(table, cf1  );      // adding new ColumnFamily
-HColumnDescriptor cf2 = ...;
-admin.modifyColumn(table, cf2 );    // modifying existing ColumnFamily
+      <para>While rows and column keys are expressed as bytes, the version is
+      specified using a long integer. Typically this long contains time
+      instances such as those returned by
+      <code>java.util.Date.getTime()</code> or
+      <code>System.currentTimeMillis()</code>, that is: <quote>the difference,
+      measured in milliseconds, between the current time and midnight, January
+      1, 1970 UTC</quote>.</para>
 
-admin.enableTable(table);                
-      </programlisting>
-      </para>See <xref linkend="client_dependencies"/> for more information about configuring client connections.
-      <para>
-      </para>
-  </section>   
-  <section xml:id="number.of.cfs">
-  <title>
-      On the number of column families
-  </title>
-  <para>
-      HBase currently does not do well with anything above two or three column families so keep the number
-      of column families in your schema low.  Currently, flushing and compactions are done on a per Region basis so
-      if one column family is carrying the bulk of the data bringing on flushes, the adjacent families
-      will also be flushed though the amount of data they carry is small.  Compaction is currently triggered
-      by the total number of files under a column family.  Its not size based.  When many column families the
-      flushing and compaction interaction can make for a bunch of needless i/o loading (To be addressed by
-      changing flushing and compaction to work on a per column family basis).
-    </para>
-    <para>Try to make do with one column family if you can in your schemas.  Only introduce a
-        second and third column family in the case where data access is usually column scoped;
-        i.e. you query one column family or the other but usually not both at the one time.
-    </para>
-  </section>
-  <section xml:id="rowkey.design"><title>Rowkey Design</title>
-    <section xml:id="timeseries">
-    <title>
-    Monotonically Increasing Row Keys/Timeseries Data
-    </title>
-    <para>
-      In the HBase chapter of Tom White's book <link xlink:url="http://oreilly.com/catalog/9780596521981">Hadoop: The Definitive Guide</link> (O'Reilly) there is a an optimization note on watching out for a phenomenon where an import process walks in lock-step with all clients in concert pounding one of the table's regions (and thus, a single node), then moving onto the next region, etc.  With monotonically increasing row-keys (i.e., using a timestamp), this will happen.  See this comic by IKai Lan on why monotonically increasing row keys are problematic in BigTable-like datastores:
-      <link xlink:href="http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/">monotonically increasing values are bad</link>.  The pile-up on a single region brought on
-      by monotonically increasing keys can be mitigated by randomizing the input records to not be in sorted order, but in general its best to avoid using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key. 
-    </para>
+      <para>The HBase version dimension is stored in decreasing order, so that
+      when reading from a store file, the most recent values are found
+      first.</para>
 
+      <para>There is a lot of confusion over the semantics of
+      <literal>cell</literal> versions, in HBase. In particular, a couple
+      questions that often come up are:<itemizedlist>
+          <listitem>
+            <para>If multiple writes to a cell have the same version, are all
+            versions maintained or just the last?<footnote>
+                <para>Currently, only the last written is fetchable.</para>
+              </footnote></para>
+          </listitem>
 
-    <para>If you do need to upload time series data into HBase, you should
-    study <link xlink:href="http://opentsdb.net/">OpenTSDB</link> as a
-    successful example.  It has a page describing the <link xlink:href=" http://opentsdb.net/schema.html">schema</link> it uses in
-    HBase.  The key format in OpenTSDB is effectively [metric_type][event_timestamp], which would appear at first glance to contradict the previous advice about not using a timestamp as the key.  However, the difference is that the timestamp is not in the <emphasis>lead</emphasis> position of the key, and the design assumption is that there are dozens or hundreds (or more) of different metric types.  Thus, even with a continual stream of input data with a mix of metric types, the Puts are distributed across various points of regions in the table.
-   </para>
-  </section>
-  <section xml:id="keysize">
-      <title>Try to minimize row and column sizes</title>
-      <subtitle>Or why are my StoreFile indices large?</subtitle>
-      <para>In HBase, values are always freighted with their coordinates; as a
-          cell value passes through the system, it'll be accompanied by its
-          row, column name, and timestamp - always.  If your rows and column names
-          are large, especially compared to the size of the cell value, then
-          you may run up against some interesting scenarios.  One such is
-          the case described by Marc Limotte at the tail of
-          <link xlink:url="https://issues.apache.org/jira/browse/HBASE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=13005272#comment-13005272">HBASE-3551</link>
-          (recommended!).
-          Therein, the indices that are kept on HBase storefiles (<xref linkend="hfile" />)
-                  to facilitate random access may end up occupyng large chunks of the HBase
-                  allotted RAM because the cell value coordinates are large.
-                  Mark in the above cited comment suggests upping the block size so
-                  entries in the store file index happen at a larger interval or
-                  modify the table schema so it makes for smaller rows and column
-                  names.
-                  Compression will also make for larger indices.  See
-                  the thread <link xlink:href="http://search-hadoop.com/m/hemBv1LiN4Q1/a+question+storefileIndexSize&amp;subj=a+question+storefileIndexSize">a question storefileIndexSize</link>
-                  up on the user mailing list.
-       </para>
-       <para>Most of the time small inefficiencies don't matter all that much.  Unfortunately,
-         this is a case where they do.  Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they could be repeated
-       several billion times in your data</para>
-       <section xml:id="keysize.cf"><title>Column Families</title>
-         <para>Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. "d" for data/default).
-         </para> 
-       </section>
-       <section xml:id="keysize.atttributes"><title>Attributes</title>
-         <para>Although verbose attribute names (e.g., "myVeryImportantAttribute") are easier to read, prefer shorter attribute names (e.g., "via")
-         to store in HBase.
-         </para> 
-       </section>
-       <section xml:id="keysize.row"><title>Rowkey Length</title>
-         <para>Keep them as short as is reasonable such that they can still be useful for required data access (e.g., Get vs. Scan). 
-         A short key that is useless for data access is not better than a longer key with better get/scan properties.  Expect tradeoffs
-         when designing rowkeys.
-         </para> 
-       </section>
-    </section>
-    <section xml:id="reverse.timestamp"><title>Reverse Timestamps</title>
-    <para>A common problem in database processing is quickly finding the most recent version of a value.  A technique using reverse timestamps
-    as a part of the key can help greatly with a special case of this problem.  Also found in the HBase chapter of Tom White's book Hadoop:  The Definitive Guide (O'Reilly), 
-    the technique involves appending (<code>Long.MAX_VALUE - timestamp</code>) to the end of any key, e.g., [key][reverse_timestamp].
-    </para>
-    <para>The most recent value for [key] in a table can be found by performing a Scan for [key] and obtaining the first record.  Since HBase keys
-    are in sorted order, this key sorts before any older row-keys for [key] and thus is first.
-    </para>
-    <para>This technique would be used instead of using <xref linkend="schema.versions">HBase Versioning</xref> where the intent is to hold onto all versions
-    "forever" (or a very long time) and at the same time quickly obtain access to any other version by using the same Scan technique.
-    </para>
-    </section>
-    <section xml:id="rowkey.scope">
-    <title>Rowkeys and ColumnFamilies</title>
-    <para>Rowkeys are scoped to ColumnFamilies.  Thus, the same rowkey could exist in each ColumnFamily that exists in a table without collision.
-    </para>
-    </section>
-    <section xml:id="changing.rowkeys"><title>Immutability of Rowkeys</title>
-    <para>Rowkeys cannot be changed.  The only way they can be "changed" in a table is if the row is deleted and then re-inserted.
-    This is a fairly common question on the HBase dist-list so it pays to get the rowkeys right the first time (and/or before you've 
-    inserted a lot of data).
-    </para>
-    </section>
-    </section>  <!--  rowkey design -->  
-    <section xml:id="schema.versions">
-  <title>
-  Number of Versions
-  </title>
-  <para>The number of row versions to store is configured per column
-      family via <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link>.
-      The default is 3.
-      This is an important parameter because as described in <xref linkend="datamodel" />
-      section HBase does <emphasis>not</emphasis> overwrite row values, but rather
-      stores different values per row by time (and qualifier).  Excess versions are removed during major
-      compactions.  The number of versions may need to be increased or decreased depending on application needs.
-  </para>
-  <para>It is not recommended setting the number of versions to an exceedingly high level (e.g., hundreds or more) unless those old values are 
-  very dear to you because this will greatly increase StoreFile size. 
-  </para>
-    <section xml:id="schema.minversions">
-    <title>
-    Minimum Number of Versions
-    </title>
-    <para>Like number of row versions, the minimum number of row versions to keep is configured per column
-      family via <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link>.
-      The default is 0, which means the feature is disabled.
-      The minimum number of row versions parameter is used together with the time-to-live parameter and can be combined with the
-      number of row versions parameter to allow configurations such as
-      "keep the last T minutes worth of data, at most N versions, <emphasis>but keep at least M versions around</emphasis>"
-      (where M is the value for minimum number of row versions, M&lt;=N).
-      This parameter should only be set when time-to-live is enabled for a column family and must be less or equal to the 
-      number of row versions.
-    </para>
-    </section>
-  </section>
-  <section xml:id="supported.datatypes">
-  <title>
-  Supported Datatypes
-  </title>
-  <para>HBase supports a "bytes-in/bytes-out" interface via <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html">Put</link> and
-  <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html">Result</link>, so anything that can be
-  converted to an array of bytes can be stored as a value.  Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes.  
-  </para>
-  <para>There are practical limits to the size of values (e.g., storing 10-50MB objects in HBase would probably be too much to ask);
-  search the mailling list for conversations on this topic. All rows in HBase conform to the <xref linkend="datamodel">datamodel</xref>, and 
-  that includes versioning.  Take that into consideration when making your design, as well as block size for the ColumnFamily.  
-  </para>
-    <section xml:id="counters">
-      <title>Counters</title>
-      <para>
-      One supported datatype that deserves special mention are "counters" (i.e., the ability to do atomic increments of numbers).  See 
-      <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#increment%28org.apache.hadoop.hbase.client.Increment%29">Increment</link> in HTable.
-      </para>
-      <para>Synchronization on counters are done on the RegionServer, not in the client.
-      </para>
-    </section> 
-  </section>
-  <section xml:id="cf.in.memory">
-  <title>
-  In-Memory ColumnFamilies
-  </title>
-  <para>ColumnFamilies can optionally be defined as in-memory.  Data is still persisted to disk, just like any other ColumnFamily.  
-  In-memory blocks have the highest priority in the <xref linkend="block.cache" />, but it is not a guarantee that the entire table
-  will be in memory.
-  </para>
-  <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> for more information.
-  </para>
-  </section>
-  <section xml:id="ttl">
-  <title>Time To Live (TTL)</title>
-  <para>ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached.
-  This applies to <emphasis>all</emphasis> versions of a row - even the current one.  The TTL time encoded in the HBase for the row is specified in UTC.
-  </para>
-  <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> for more information.
-  </para>
-  </section>
-  <section xml:id="schema.bloom">
-  <title>Bloom Filters</title>
-  <para>Bloom Filters can be enabled per-ColumnFamily.
-        Use <code>HColumnDescriptor.setBloomFilterType(NONE | ROW |
-        ROWCOL)</code> to enable blooms per Column Family. Default =
-        <varname>NONE</varname> for no bloom filters. If
-        <varname>ROW</varname>, the hash of the row will be added to the bloom
-        on each insert. If <varname>ROWCOL</varname>, the hash of the row +
-        column family + column family qualifier will be added to the bloom on
-        each key insert.</para>
-  <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> and 
-  <xref linkend="blooms"/> for more information.
-  </para>
-  </section>
-  <section xml:id="secondary.indexes">
-  <title>
-  Secondary Indexes and Alternate Query Paths
-  </title>
-  <para>This section could also be titled "what if my table rowkey looks like <emphasis>this</emphasis> but I also want to query my table like <emphasis>that</emphasis>."
-  A common example on the dist-list is where a row-key is of the format "user-timestamp" but there are are reporting requirements on activity across users for certain 
-  time ranges.  Thus, selecting by user is easy because it is in the lead position of the key, but time is not.
-  </para>
-  <para>There is no single answer on the best way to handle this because it depends on...
-   <itemizedlist>
-       <listitem>Number of users</listitem>  
-       <listitem>Data size and data arrival rate</listitem>
-       <listitem>Flexibility of reporting requirements (e.g., completely ad-hoc date selection vs. pre-configured ranges) </listitem>  
-       <listitem>Desired execution speed of query (e.g., 90 seconds may be reasonable to some for an ad-hoc report, whereas it may be too long for others) </listitem>  
-   </itemizedlist>
-   ... and solutions are also influenced by the size of the cluster and how much processing power you have to throw at the solution.  
-   Common techniques are in sub-sections below.  This is a comprehensive, but not exhaustive, list of approaches.   
-  </para>
-  <para>It should not be a surprise that secondary indexes require additional cluster space and processing.  
-  This is precisely what happens in an RDBMS because the act of creating an alternate index requires both space and processing cycles to update.  RBDMS products
-  are more advanced in this regard to handle alternative index management out of the box.  However, HBase scales better at larger data volumes, so this is a feature trade-off. 
-  </para>
-  <para>Pay attention to <xref linkend="performance"/> when implementing any of these approaches.</para>
-  <para>Additionally, see the David Butler response in this dist-list thread <link xlink:href="http://search-hadoop.com/m/nvbiBp2TDP/Stargate%252Bhbase&amp;subj=Stargate+hbase">HBase, mail # user - Stargate+hbase</link>
-   </para>
-    <section xml:id="secondary.indexes.filter">
-      <title>
-       Filter Query
-      </title>
-      <para>Depending on the case, it may be appropriate to use <xref linkend="client.filter"/>.  In this case, no secondary index is created.
-      However, don't try a full-scan on a large table like this from an application (i.e., single-threaded client).
-      </para>
-    </section>
-    <section xml:id="secondary.indexes.periodic">
-      <title>
-       Periodic-Update Secondary Index
-      </title>
-      <para>A secondary index could be created in an other table which is periodically updated via a MapReduce job.  The job could be executed intra-day, but depending on 
-      load-strategy it could still potentially be out of sync with the main data table.</para>
-      <para>See <xref linkend="mapreduce.example.readwrite"/> for more information.</para>
-    </section>
-    <section xml:id="secondary.indexes.dualwrite">
-      <title>
-       Dual-Write Secondary Index
-      </title>
-      <para>Another strategy is to build the secondary index while publishing data to the cluster (e.g., write to data table, write to index table). 
-      If this is approach is taken after a data table already exists, then bootstrapping will be needed for the secondary index with a MapReduce job (see <xref linkend="secondary.indexes.periodic"/>).</para>
-    </section>
-    <section xml:id="secondary.indexes.summary">
-      <title>
-       Summary Tables
-      </title>
-      <para>Where time-ranges are very wide (e.g., year-long report) and where the data is voluminous, summary tables are a common approach.
-      These would be generated with MapReduce jobs into another table.</para>
-      <para>See <xref linkend="mapreduce.example.summary"/> for more information.</para>
-    </section>
-    <section xml:id="secondary.indexes.coproc">
-      <title>
-       Coprocessor Secondary Index
-      </title>
-      <para>Coprocessors act like RDBMS triggers.  These are currently on TRUNK.
-      </para>
-    </section>
-  </section>
+          <listitem>
+            <para>Is it OK to write cells in a non-increasing version
+            order?<footnote>
+                <para>Yes</para>
+              </footnote></para>
+          </listitem>
+        </itemizedlist></para>
 
-  </chapter>
+      <para>Below we describe how the version dimension in HBase currently
+      works<footnote>
+          <para>See <link
+          xlink:href="https://issues.apache.org/jira/browse/HBASE-2406">HBASE-2406</link>
+          for discussion of HBase versions. <link
+          xlink:href="http://outerthought.org/blog/417-ot.html">Bending time
+          in HBase</link> makes for a good read on the version, or time,
+          dimension in HBase. It has more detail on versioning than is
+          provided here. As of this writing, the limiitation
+          <emphasis>Overwriting values at existing timestamps</emphasis>
+          mentioned in the article no longer holds in HBase. This section is
+          basically a synopsis of this article by Bruno Dumon.</para>
+        </footnote>.</para>
 
-  <chapter xml:id="hbase_metrics">
-  <title>Metrics</title>
-  <section xml:id="metric_setup">
-  <title>Metric Setup</title>
-  <para>See <link xlink:href="http://hbase.apache.org/metrics.html">Metrics</link> for
-  an introduction and how to enable Metrics emission.
-  </para>
-  </section>
-   <section xml:id="rs_metrics">
-   <title>RegionServer Metrics</title>
-          <section xml:id="hbase.regionserver.blockCacheCount"><title><varname>hbase.regionserver.blockCacheCount</varname></title>
-          <para>Block cache item count in memory.  This is the number of blocks of storefiles (HFiles) in the cache.</para>
-		  </section>
-         <section xml:id="hbase.regionserver.blockCacheFree"><title><varname>hbase.regionserver.blockCacheFree</varname></title>
-          <para>Block cache memory available (bytes).</para>
-		  </section>
-         <section xml:id="hbase.regionserver.blockCacheHitRatio"><title><varname>hbase.regionserver.blockCacheHitRatio</varname></title>
-          <para>Block cache hit ratio (0 to 100).  TODO:  describe impact to ratio where read requests that have cacheBlocks=false</para>
-		  </section>
-          <section xml:id="hbase.regionserver.blockCacheSize"><title><varname>hbase.regionserver.blockCacheSize</varname></title>
-          <para>Block cache size in memory (bytes).  i.e., memory in use by the BlockCache</para>
-		  </section>
-          <section xml:id="hbase.regionserver.compactionQueueSize"><title><varname>hbase.regionserver.compactionQueueSize</varname></title>
-          <para>Size of the compaction queue.  This is the number of stores in the region that have been targeted for compaction.</para>
-		  </section>
-          <section xml:id="hbase.regionserver.fsReadLatency_avg_time"><title><varname>hbase.regionserver.fsReadLatency_avg_time</varname></title>
-          <para>Filesystem read latency (ms).  This is the average time to read from HDFS.</para>
-		  </section>
-          <section xml:id="hbase.regionserver.fsReadLatency_num_ops"><title><varname>hbase.regionserver.fsReadLatency_num_ops</varname></title>
-          <para>TODO</para>
-		  </section>
-          <section xml:id="hbase.regionserver.fsSyncLatency_avg_time"><title><varname>hbase.regionserver.fsSyncLatency_avg_time</varname></title>
-          <para>Filesystem sync latency (ms)</para>
-		  </section>
-          <section xml:id="hbase.regionserver.fsSyncLatency_num_ops"><title><varname>hbase.regionserver.fsSyncLatency_num_ops</varname></title>
-          <para>TODO</para>
-		  </section>
-          <section xml:id="hbase.regionserver.fsWriteLatency_avg_time"><title><varname>hbase.regionserver.fsWriteLatency_avg_time</varname></title>
-          <para>Filesystem write latency (ms)</para>
-		  </section>
-          <section xml:id="hbase.regionserver.fsWriteLatency_num_ops"><title><varname>hbase.regionserver.fsWriteLatency_num_ops</varname></title>
-          <para>TODO</para>
-		  </section>
-          <section xml:id="hbase.regionserver.memstoreSizeMB"><title><varname>hbase.regionserver.memstoreSizeMB</varname></title>
-          <para>Sum of all the memstore sizes in this RegionServer (MB)</para>
-		  </section>
-          <section xml:id="hbase.regionserver.regions"><title><varname>hbase.regionserver.regions</varname></title>
-          <para>Number of regions served by the RegionServer</para>
-		  </section>
-          <section xml:id="hbase.regionserver.requests"><title><varname>hbase.regionserver.requests</varname></title>
-          <para>Total number of read and write requests.  Requests correspond to RegionServer RPC calls, thus a single Get will result in 1 request, but a Scan with caching set to 1000 will result in 1 request for each 'next' call (i.e., not each row).  A bulk-load request will constitute 1 request per HFile.</para>
-		  </section>
-          <section xml:id="hbase.regionserver.storeFileIndexSizeMB"><title><varname>hbase.regionserver.storeFileIndexSizeMB</varname></title>
-          <para>Sum of all the storefile index sizes in this RegionServer (MB)</para>
-		  </section>
-          <section xml:id="hbase.regionserver.stores"><title><varname>hbase.regionserver.stores</varname></title>
-          <para>Number of stores open on the RegionServer.  A store corresponds to a column family.  For example, if a table (which contains the column family) has 3 regions on a RegionServer, there will be 3 stores open for that column family. </para>
-		  </section>
-          <section xml:id="hbase.regionserver.storeFiles"><title><varname>hbase.regionserver.storeFiles</varname></title>
-          <para>Number of store filles open on the RegionServer.  A store may have more than one storefile (HFile).</para>
-		  </section>
-   </section>
-  </chapter>
+      <section xml:id="versions.ops">
+        <title>Versions and HBase Operations</title>
 
-  <chapter xml:id="cluster_replication">
-  <title>Cluster Replication</title>
-  <para>See <link xlink:href="http://hbase.apache.org/replication.html">Cluster Replication</link>.
-  </para>
-  </chapter>
+        <para>In this section we look at the behavior of the version dimension
+        for each of the core HBase operations.</para>
 
-  <chapter xml:id="datamodel">
-    <title>Data Model</title>
-    <para>In short, applications store data into an HBase table.
-        Tables are made of rows and columns.
-      All columns in HBase belong to a particular column family.
-      Table cells -- the intersection of row and column
-      coordinates -- are versioned.
-      A cell’s content is an uninterpreted array of bytes.
-  </para>
-      <para>Table row keys are also byte arrays so almost anything can
-      serve as a row key from strings to binary representations of longs or
-      even serialized data structures. Rows in HBase tables
-      are sorted by row key. The sort is byte-ordered. All table accesses are
-      via the table row key -- its primary key.
-</para>
-
-    <section xml:id="conceptual.view"><title>Conceptual View</title>
-	<para>
-        The following example is a slightly modified form of the one on page
-        2 of the <link xlink:href="http://labs.google.com/papers/bigtable.html">BigTable</link> paper.
-    There is a table called <varname>webtable</varname> that contains two column families named
-    <varname>contents</varname> and <varname>anchor</varname>.
-    In this example, <varname>anchor</varname> contains two
-    columns (<varname>anchor:cssnsi.com</varname>, <varname>anchor:my.look.ca</varname>)
-    and <varname>contents</varname> contains one column (<varname>contents:html</varname>).
-    <note>
-        <title>Column Names</title>
-      <para>
-      By convention, a column name is made of its column family prefix and a
-      <emphasis>qualifier</emphasis>. For example, the
-      column
-      <emphasis>contents:html</emphasis> is of the column family <varname>contents</varname>
-          The colon character (<literal
-          moreinfo="none">:</literal>) delimits the column family from the
-          column family <emphasis>qualifier</emphasis>.
-    </para>
-    </note>
-    <table frame='all'><title>Table <varname>webtable</varname></title>
-	<tgroup cols='4' align='left' colsep='1' rowsep='1'>
-	<colspec colname='c1'/>
-	<colspec colname='c2'/>
-	<colspec colname='c3'/>
-	<colspec colname='c4'/>
-	<thead>
-        <row><entry>Row Key</entry><entry>Time Stamp</entry><entry>ColumnFamily <varname>contents</varname></entry><entry>ColumnFamily <varname>anchor</varname></entry></row>
-	</thead>
-	<tbody>
-        <row><entry>"com.cnn.www"</entry><entry>t9</entry><entry></entry><entry><varname>anchor:cnnsi.com</varname> = "CNN"</entry></row>
-        <row><entry>"com.cnn.www"</entry><entry>t8</entry><entry></entry><entry><varname>anchor:my.look.ca</varname> = "CNN.com"</entry></row>
-        <row><entry>"com.cnn.www"</entry><entry>t6</entry><entry><varname>contents:html</varname> = "&lt;html&gt;..."</entry><entry></entry></row>
-        <row><entry>"com.cnn.www"</entry><entry>t5</entry><entry><varname>contents:html</varname> = "&lt;html&gt;..."</entry><entry></entry></row>
-        <row><entry>"com.cnn.www"</entry><entry>t3</entry><entry><varname>contents:html</varname> = "&lt;html&gt;..."</entry><entry></entry></row>
-	</tbody>
-	</tgroup>
-	</table>
-	</para>
-	</section>
-    <section xml:id="physical.view"><title>Physical View</title>
-	<para>
-        Although at a conceptual level tables may be viewed as a sparse set of rows.
-        Physically they are stored on a per-column family basis.  New columns
-        (i.e., <varname>columnfamily:column</varname>) can be added to any
-        column family without pre-announcing them. 
-        <table frame='all'><title>ColumnFamily <varname>anchor</varname></title>
-	<tgroup cols='3' align='left' colsep='1' rowsep='1'>
-	<colspec colname='c1'/>
-	<colspec colname='c2'/>
-	<colspec colname='c3'/>
-	<thead>
-        <row><entry>Row Key</entry><entry>Time Stamp</entry><entry>Column Family <varname>anchor</varname></entry></row>
-	</thead>
-	<tbody>
-        <row><entry>"com.cnn.www"</entry><entry>t9</entry><entry><varname>anchor:cnnsi.com</varname> = "CNN"</entry></row>
-        <row><entry>"com.cnn.www"</entry><entry>t8</entry><entry><varname>anchor:my.look.ca</varname> = "CNN.com"</entry></row>
-	</tbody>
-	</tgroup>
-	</table>
-    <table frame='all'><title>ColumnFamily <varname>contents</varname></title>
-	<tgroup cols='3' align='left' colsep='1' rowsep='1'>
-	<colspec colname='c1'/>
-	<colspec colname='c2'/>
-	<colspec colname='c3'/>
-	<thead>
-	<row><entry>Row Key</entry><entry>Time Stamp</entry><entry>ColumnFamily "contents:"</entry></row>
-	</thead>
-	<tbody>
-        <row><entry>"com.cnn.www"</entry><entry>t6</entry><entry><varname>contents:html</varname> = "&lt;html&gt;..."</entry></row>
-        <row><entry>"com.cnn.www"</entry><entry>t5</entry><entry><varname>contents:html</varname> = "&lt;html&gt;..."</entry></row>
-        <row><entry>"com.cnn.www"</entry><entry>t3</entry><entry><varname>contents:html</varname> = "&lt;html&gt;..."</entry></row>
-	</tbody>
-	</tgroup>
-	</table>
-    It is important to note in the diagram above that the empty cells shown in the
-    conceptual view are not stored since they need not be in a column-oriented
-    storage format. Thus a request for the value of the <varname>contents:html</varname>
-    column at time stamp <literal>t8</literal> would return no value. Similarly, a
-    request for an <varname>anchor:my.look.ca</varname> value at time stamp
-    <literal>t9</literal> would return no value.  However, if no timestamp is
-    supplied, the most recent value for a particular column would be returned
-    and would also be the first one found since timestamps are stored in
-    descending order. Thus a request for the values of all columns in the row
-    <varname>com.cnn.www</varname> if no timestamp is specified would be:
-    the value of <varname>contents:html</varname> from time stamp
-    <literal>t6</literal>, the value of <varname>anchor:cnnsi.com</varname>
-    from time stamp <literal>t9</literal>, the value of 
-    <varname>anchor:my.look.ca</varname> from time stamp <literal>t8</literal>.
-	</para>
-	</section>
-
-    <section xml:id="table">
-      <title>Table</title>
-      <para>
-      Tables are declared up front at schema definition time.
-      </para>
-    </section>
-
-    <section xml:id="row">
-      <title>Row</title>
-      <para>Row keys are uninterrpreted bytes. Rows are
-      lexicographically sorted with the lowest order appearing first
-      in a table.  The empty byte array is used to denote both the
-      start and end of a tables' namespace.</para>
-    </section>
-
-    <section xml:id="columnfamily">
-      <title>Column Family<indexterm><primary>Column Family</primary></indexterm></title>
-        <para>
-      Columns in HBase are grouped into <emphasis>column families</emphasis>.
-      All column members of a column family have the same prefix.  For example, the
-      columns <emphasis>courses:history</emphasis> and
-      <emphasis>courses:math</emphasis> are both members of the
-      <emphasis>courses</emphasis> column family.
-          The colon character (<literal
-          moreinfo="none">:</literal>) delimits the column family from the
-      <indexterm>column family <emphasis>qualifier</emphasis><primary>Column Family Qualifier</primary></indexterm>.
-        The column family prefix must be composed of
-      <emphasis>printable</emphasis> characters. The qualifying tail, the
-      column family <emphasis>qualifier</emphasis>, can be made of any
-      arbitrary bytes. Column families must be declared up front
-      at schema definition time whereas columns do not need to be
-      defined at schema time but can be conjured on the fly while
-      the table is up an running.</para>
-      <para>Physically, all column family members are stored together on the
-      filesystem.  Because tunings and
-      storage specifications are done at the column family level, it is
-      advised that all column family members have the same general access
-      pattern and size characteristics.</para>
-
-      <para></para>
-    </section>
-    <section xml:id="cells">
-      <title>Cells<indexterm><primary>Cells</primary></indexterm></title>
-      <para>A <emphasis>{row, column, version} </emphasis>tuple exactly
-      specifies a <literal>cell</literal> in HBase. 
-      Cell content is uninterrpreted bytes</para>
-    </section>
-    <section xml:id="data_model_operations">
-       <title>Data Model Operations</title>
-       <para>The four primary data model operations are Get, Put, Scan, and Delete.  Operations are applied via 
-       <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html">HTable</link> instances.
-       </para>
-      <section xml:id="get">
-        <title>Get</title>
-        <para><link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/Get.html">Get</link> returns
-        attributes for a specified row.  Gets are executed via 
-        <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#get%28org.apache.hadoop.hbase.client.Get%29">
-        HTable.get</link>.
-        </para>
-      </section>
-      <section xml:id="put">
-        <title>Put</title>
-        <para><link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/Put.html">Put</link> either 
-        adds new rows to a table (if the key is new) or can update existing rows (if the key already exists).  Puts are executed via
-        <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#put%28org.apache.hadoop.hbase.client.Put%29">
-        HTable.put</link> (writeBuffer) or <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#batch%28java.util.List%29">
-        HTable.batch</link> (non-writeBuffer).  
-        </para>
-      </section>
-      <section xml:id="scan">
-          <title>Scans</title>
-          <para><link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/Scan.html">Scan</link> allow
-          iteration over multiple rows for specified attributes.
-          </para>
-          <para>The following is an example of a 
-           on an HTable table instance.  Assume that a table is populated with rows with keys "row1", "row2", "row3", 
-           and then another set of rows with the keys "abc1", "abc2", and "abc3".  The following example shows how startRow and stopRow 
-           can be applied to a Scan instance to return the rows beginning with "row".        
-<programlisting>
-HTable htable = ...      // instantiate HTable
-    
-Scan scan = new Scan();
-scan.addColumn(Bytes.toBytes("cf"),Bytes.toBytes("attr"));
-scan.setStartRow( Bytes.toBytes("row"));
-scan.setStopRow( Bytes.toBytes("row" +  new byte[] {0}));  // note: stop key != start key
-for(Result result : htable.getScanner(scan)) {
-  // process Result instance
-}
-</programlisting>
-         </para>        
-        </section>
-      <section xml:id="delete">
-        <title>Delete</title>
-        <para><link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/Delete.html">Delete</link> removes
-        a row from a table.  Deletes are executed via 
-        <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29">
-        HTable.delete</link>.
-        </para>
-      </section>
-            
-    </section>
-
-
-    <section xml:id="versions">
-      <title>Versions<indexterm><primary>Versions</primary></indexterm></title>
-
-      <para>A <emphasis>{row, column, version} </emphasis>tuple exactly
-      specifies a <literal>cell</literal> in HBase. Its possible to have an
-      unbounded number of cells where the row and column are the same but the
-      cell address differs only in its version dimension.</para>
-
-      <para>While rows and column keys are expressed as bytes, the version is
-      specified using a long integer. Typically this long contains time
-      instances such as those returned by
-      <code>java.util.Date.getTime()</code> or
-      <code>System.currentTimeMillis()</code>, that is: <quote>the difference,
-      measured in milliseconds, between the current time and midnight, January
-      1, 1970 UTC</quote>.</para>
-
-      <para>The HBase version dimension is stored in decreasing order, so that
-      when reading from a store file, the most recent values are found
-      first.</para>
-
-      <para>There is a lot of confusion over the semantics of
-      <literal>cell</literal> versions, in HBase. In particular, a couple
-      questions that often come up are:<itemizedlist>
-          <listitem>
-            <para>If multiple writes to a cell have the same version, are all
-            versions maintained or just the last?<footnote>
-                <para>Currently, only the last written is fetchable.</para>
-              </footnote></para>
-          </listitem>
-
-          <listitem>
-            <para>Is it OK to write cells in a non-increasing version
-            order?<footnote>
-                <para>Yes</para>
-              </footnote></para>
-          </listitem>
-        </itemizedlist></para>
-
-      <para>Below we describe how the version dimension in HBase currently
-      works<footnote>
-          <para>See <link
-          xlink:href="https://issues.apache.org/jira/browse/HBASE-2406">HBASE-2406</link>
-          for discussion of HBase versions. <link
-          xlink:href="http://outerthought.org/blog/417-ot.html">Bending time
-          in HBase</link> makes for a good read on the version, or time,
-          dimension in HBase. It has more detail on versioning than is
-          provided here. As of this writing, the limiitation
-          <emphasis>Overwriting values at existing timestamps</emphasis>
-          mentioned in the article no longer holds in HBase. This section is
-          basically a synopsis of this article by Bruno Dumon.</para>
-        </footnote>.</para>
-
-      <section xml:id="versions.ops">
-        <title>Versions and HBase Operations</title>
-
-        <para>In this section we look at the behavior of the version dimension
-        for each of the core HBase operations.</para>
-
-        <section>
-          <title>Get/Scan</title>
+        <section>
+          <title>Get/Scan</title>
 
           <para>Gets are implemented on top of Scans. The below discussion of
             <link xlink:href="http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/Get.html">Get</link> applies equally to <link
@@ -1020,6 +420,9 @@ long explicitTimeInMs = 555;  // just an
 put.add(Bytes.toBytes("cf"), Bytes.toBytes("attr1"), explicitTimeInMs, Bytes.toBytes(data));
 htable.put(put);
 </programlisting>
+          Caution:  the version timestamp is internally by HBase for things like time-to-live calculations.  
+          It's usually best to avoid setting this timestamp yourself.  Prefer using a separate 
+          timestamp attribute of the row, or have the timestamp a part of the rowkey, or both.
           </para>
           </section>
           
@@ -1066,60 +469,685 @@ htable.put(put);
         </section>
       </section>
 
-      <section>
-        <title>Current Limitations</title>
+      <section>
+        <title>Current Limitations</title>
+
+        <para>There are still some bugs (or at least 'undecided behavior')
+        with the version dimension that will be addressed by later HBase
+        releases.</para>
+
+        <section>
+          <title>Deletes mask Puts</title>
+
+          <para>Deletes mask puts, even puts that happened after the delete
+          was entered<footnote>
+              <para><link
+              xlink:href="https://issues.apache.org/jira/browse/HBASE-2256">HBASE-2256</link></para>
+            </footnote>. Remember that a delete writes a tombstone, which only
+          disappears after then next major compaction has run. Suppose you do
+          a delete of everything &lt;= T. After this you do a new put with a
+          timestamp &lt;= T. This put, even if it happened after the delete,
+          will be masked by the delete tombstone. Performing the put will not
+          fail, but when you do a get you will notice the put did have no
+          effect. It will start working again after the major compaction has
+          run. These issues should not be a problem if you use
+          always-increasing versions for new puts to a row. But they can occur
+          even if you do not care about time: just do delete and put
+          immediately after each other, and there is some chance they happen
+          within the same millisecond.</para>
+        </section>
+
+        <section>
+          <title>Major compactions change query results</title>
+
+          <para><quote>...create three cell versions at t1, t2 and t3, with a
+          maximum-versions setting of 2. So when getting all versions, only
+          the values at t2 and t3 will be returned. But if you delete the
+          version at t2 or t3, the one at t1 will appear again. Obviously,
+          once a major compaction has run, such behavior will not be the case
+          anymore...<footnote>
+              <para>See <emphasis>Garbage Collection</emphasis> in <link
+              xlink:href="http://outerthought.org/blog/417-ot.html">Bending
+              time in HBase</link> </para>
+            </footnote></quote></para>
+        </section>
+      </section>
+    </section>
+  </chapter>  <!-- data model -->
+
+ <chapter xml:id="schema">
+  <title>HBase and Schema Design</title>
+      <para>A good general introduction on the strength and weaknesses modelling on
+          the various non-rdbms datastores is Ian Varleys' Master thesis,
+          <link xlink:href="http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf">No Relation: The Mixed Blessings of Non-Relational Databases</link>.
+          Recommended.  Also, read <xref linkend="keyvalue"/> for how HBase stores data internally.
+      </para>
+  <section xml:id="schema.creation">
+  <title>
+      Schema Creation
+  </title>
+  <para>HBase schemas can be created or updated with <xref linkend="shell" />
+      or by using <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html">HBaseAdmin</link> in the Java API.
+      </para>
+      <para>Tables must be disabled when making ColumnFamily modifications, for example..
+      <programlisting>
+Configuration config = HBaseConfiguration.create();  
+HBaseAdmin admin = new HBaseAdmin(conf);    
+String table = "myTable";
+
+admin.disableTable(table);           
+
+HColumnDescriptor cf1 = ...;
+admin.addColumn(table, cf1  );      // adding new ColumnFamily
+HColumnDescriptor cf2 = ...;
+admin.modifyColumn(table, cf2 );    // modifying existing ColumnFamily
+
+admin.enableTable(table);                
+      </programlisting>
+      </para>See <xref linkend="client_dependencies"/> for more information about configuring client connections.
+      <para>Note:  online schema changes are supported in the 0.92.x codebase, but the 0.90.x codebase requires the table
+      to be disabled.
+      </para>
+  </section>   
+  <section xml:id="number.of.cfs">
+  <title>
+      On the number of column families
+  </title>
+  <para>
+      HBase currently does not do well with anything above two or three column families so keep the number
+      of column families in your schema low.  Currently, flushing and compactions are done on a per Region basis so
+      if one column family is carrying the bulk of the data bringing on flushes, the adjacent families
+      will also be flushed though the amount of data they carry is small.  Compaction is currently triggered
+      by the total number of files under a column family.  Its not size based.  When many column families the
+      flushing and compaction interaction can make for a bunch of needless i/o loading (To be addressed by
+      changing flushing and compaction to work on a per column family basis).
+    </para>
+    <para>Try to make do with one column family if you can in your schemas.  Only introduce a
+        second and third column family in the case where data access is usually column scoped;
+        i.e. you query one column family or the other but usually not both at the one time.
+    </para>
+    <section xml:id="number.of.cfs.card"><title>Cardinality of ColumnFamilies</title>
+      <para>Where multiple ColumnFamilies exist in a single table, be aware of the cardinality (i.e., number of rows).  
+      If ColumnFamilyA has 1 million rows and ColumnFamilyB has 1 billion rows, ColumnFamilyA's data will likely be spread 
+      across many, many regions (and RegionServers).  This makes mass scans for ColumnFamilyA less efficient.  
+      </para>
+    </section>
+  </section>
+  <section xml:id="rowkey.design"><title>Rowkey Design</title>
+    <section xml:id="timeseries">
+    <title>
+    Monotonically Increasing Row Keys/Timeseries Data
+    </title>
+    <para>
+      In the HBase chapter of Tom White's book <link xlink:url="http://oreilly.com/catalog/9780596521981">Hadoop: The Definitive Guide</link> (O'Reilly) there is a an optimization note on watching out for a phenomenon where an import process walks in lock-step with all clients in concert pounding one of the table's regions (and thus, a single node), then moving onto the next region, etc.  With monotonically increasing row-keys (i.e., using a timestamp), this will happen.  See this comic by IKai Lan on why monotonically increasing row keys are problematic in BigTable-like datastores:
+      <link xlink:href="http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/">monotonically increasing values are bad</link>.  The pile-up on a single region brought on
+      by monotonically increasing keys can be mitigated by randomizing the input records to not be in sorted order, but in general its best to avoid using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key. 
+    </para>
+
+
+    <para>If you do need to upload time series data into HBase, you should
+    study <link xlink:href="http://opentsdb.net/">OpenTSDB</link> as a
+    successful example.  It has a page describing the <link xlink:href=" http://opentsdb.net/schema.html">schema</link> it uses in
+    HBase.  The key format in OpenTSDB is effectively [metric_type][event_timestamp], which would appear at first glance to contradict the previous advice about not using a timestamp as the key.  However, the difference is that the timestamp is not in the <emphasis>lead</emphasis> position of the key, and the design assumption is that there are dozens or hundreds (or more) of different metric types.  Thus, even with a continual stream of input data with a mix of metric types, the Puts are distributed across various points of regions in the table.
+   </para>
+  </section>
+  <section xml:id="keysize">
+      <title>Try to minimize row and column sizes</title>
+      <subtitle>Or why are my StoreFile indices large?</subtitle>
+      <para>In HBase, values are always freighted with their coordinates; as a
+          cell value passes through the system, it'll be accompanied by its
+          row, column name, and timestamp - always.  If your rows and column names
+          are large, especially compared to the size of the cell value, then
+          you may run up against some interesting scenarios.  One such is
+          the case described by Marc Limotte at the tail of
+          <link xlink:url="https://issues.apache.org/jira/browse/HBASE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=13005272#comment-13005272">HBASE-3551</link>
+          (recommended!).
+          Therein, the indices that are kept on HBase storefiles (<xref linkend="hfile" />)
+                  to facilitate random access may end up occupyng large chunks of the HBase
+                  allotted RAM because the cell value coordinates are large.
+                  Mark in the above cited comment suggests upping the block size so
+                  entries in the store file index happen at a larger interval or
+                  modify the table schema so it makes for smaller rows and column
+                  names.
+                  Compression will also make for larger indices.  See
+                  the thread <link xlink:href="http://search-hadoop.com/m/hemBv1LiN4Q1/a+question+storefileIndexSize&amp;subj=a+question+storefileIndexSize">a question storefileIndexSize</link>
+                  up on the user mailing list.
+       </para>
+       <para>Most of the time small inefficiencies don't matter all that much.  Unfortunately,
+         this is a case where they do.  Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they could be repeated
+       several billion times in your data. </para>
+       <para>See <xref linkend="keyvalue"/> for more information on HBase stores data internally.</para>
+       <section xml:id="keysize.cf"><title>Column Families</title>
+         <para>Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. "d" for data/default).
+         </para> 
+       </section>
+       <section xml:id="keysize.atttributes"><title>Attributes</title>
+         <para>Although verbose attribute names (e.g., "myVeryImportantAttribute") are easier to read, prefer shorter attribute names (e.g., "via")
+         to store in HBase.
+         </para> 
+       </section>
+       <section xml:id="keysize.row"><title>Rowkey Length</title>
+         <para>Keep them as short as is reasonable such that they can still be useful for required data access (e.g., Get vs. Scan). 
+         A short key that is useless for data access is not better than a longer key with better get/scan properties.  Expect tradeoffs
+         when designing rowkeys.
+         </para> 
+       </section>
+       <section xml:id="keysize.patterns"><title>Byte Patterns</title>
+         <para>A long is 8 bytes.  You can store an unsigned number up to 18,446,744,073,709,551,615 in those eight bytes.
+            If you stored this number as a String -- presuming a byte per character -- you need nearly 3x the bytes.
+         </para>
+         <para>Not convinced?  Below is some sample code that you can run on your own.
+<programlisting>
+// long
+//
+long l = 1234567890L;
+byte[] lb = Bytes.toBytes(l);
+System.out.println("long bytes length: " + lb.length);   // returns 8
+		
+String s = "" + l;
+byte[] sb = Bytes.toBytes(s);
+System.out.println("long as string length: " + sb.length);    // returns 10
+			
+// hash 
+//
+MessageDigest md = MessageDigest.getInstance("MD5");
+byte[] digest = md.digest(Bytes.toBytes(s));
+System.out.println("md5 digest bytes length: " + digest.length);    // returns 16
+		
+String sDigest = new String(digest);
+byte[] sbDigest = Bytes.toBytes(sDigest);
+System.out.println("md5 digest as string length: " + sbDigest.length);    // returns 26		
+</programlisting>               
+         </para>
+       </section>
+    </section>
+    <section xml:id="reverse.timestamp"><title>Reverse Timestamps</title>
+    <para>A common problem in database processing is quickly finding the most recent version of a value.  A technique using reverse timestamps
+    as a part of the key can help greatly with a special case of this problem.  Also found in the HBase chapter of Tom White's book Hadoop:  The Definitive Guide (O'Reilly), 
+    the technique involves appending (<code>Long.MAX_VALUE - timestamp</code>) to the end of any key, e.g., [key][reverse_timestamp].
+    </para>
+    <para>The most recent value for [key] in a table can be found by performing a Scan for [key] and obtaining the first record.  Since HBase keys
+    are in sorted order, this key sorts before any older row-keys for [key] and thus is first.
+    </para>
+    <para>This technique would be used instead of using <xref linkend="schema.versions">HBase Versioning</xref> where the intent is to hold onto all versions
+    "forever" (or a very long time) and at the same time quickly obtain access to any other version by using the same Scan technique.
+    </para>
+    </section>
+    <section xml:id="rowkey.scope">
+    <title>Rowkeys and ColumnFamilies</title>
+    <para>Rowkeys are scoped to ColumnFamilies.  Thus, the same rowkey could exist in each ColumnFamily that exists in a table without collision.
+    </para>
+    </section>
+    <section xml:id="changing.rowkeys"><title>Immutability of Rowkeys</title>
+    <para>Rowkeys cannot be changed.  The only way they can be "changed" in a table is if the row is deleted and then re-inserted.
+    This is a fairly common question on the HBase dist-list so it pays to get the rowkeys right the first time (and/or before you've 
+    inserted a lot of data).
+    </para>
+    </section>
+    </section>  <!--  rowkey design -->  
+    <section xml:id="schema.versions">
+  <title>
+  Number of Versions
+  </title>
+     <section xml:id="schema.versions.max"><title>Maximum Number of Versions</title>
+      <para>The maximum number of row versions to store is configured per column
+      family via <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link>.
+      The default for max versions is 3.
+      This is an important parameter because as described in <xref linkend="datamodel" />
+      section HBase does <emphasis>not</emphasis> overwrite row values, but rather
+      stores different values per row by time (and qualifier).  Excess versions are removed during major
+      compactions.  The number of max versions may need to be increased or decreased depending on application needs.
+      </para>
+      <para>It is not recommended setting the number of max versions to an exceedingly high level (e.g., hundreds or more) unless those old values are 
+      very dear to you because this will greatly increase StoreFile size. 
+      </para>
+     </section>
+    <section xml:id="schema.minversions">
+    <title>
+    Minimum Number of Versions
+    </title>
+    <para>Like maximum number of row versions, the minimum number of row versions to keep is configured per column
+      family via <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link>.
+      The default for min versions is 0, which means the feature is disabled.
+      The minimum number of row versions parameter is used together with the time-to-live parameter and can be combined with the
+      number of row versions parameter to allow configurations such as
+      "keep the last T minutes worth of data, at most N versions, <emphasis>but keep at least M versions around</emphasis>"
+      (where M is the value for minimum number of row versions, M&lt;N).
+      This parameter should only be set when time-to-live is enabled for a column family and must be less than the
+      number of row versions.
+    </para>
+    </section>
+  </section>
+  <section xml:id="supported.datatypes">
+  <title>
+  Supported Datatypes
+  </title>
+  <para>HBase supports a "bytes-in/bytes-out" interface via <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html">Put</link> and
+  <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html">Result</link>, so anything that can be
+  converted to an array of bytes can be stored as a value.  Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes.  
+  </para>
+  <para>There are practical limits to the size of values (e.g., storing 10-50MB objects in HBase would probably be too much to ask);
+  search the mailling list for conversations on this topic. All rows in HBase conform to the <xref linkend="datamodel">datamodel</xref>, and 
+  that includes versioning.  Take that into consideration when making your design, as well as block size for the ColumnFamily.  
+  </para>
+    <section xml:id="counters">
+      <title>Counters</title>
+      <para>
+      One supported datatype that deserves special mention are "counters" (i.e., the ability to do atomic increments of numbers).  See 
+      <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#increment%28org.apache.hadoop.hbase.client.Increment%29">Increment</link> in HTable.
+      </para>
+      <para>Synchronization on counters are done on the RegionServer, not in the client.
+      </para>
+    </section> 
+  </section>
+  <section xml:id="ttl">
+  <title>Time To Live (TTL)</title>
+  <para>ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached.
+  This applies to <emphasis>all</emphasis> versions of a row - even the current one.  The TTL time encoded in the HBase for the row is specified in UTC.
+  </para>
+  <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> for more information.
+  </para>
+  </section>
+  <section xml:id="cf.keep.deleted">
+  <title>
+  Keeping Deleted Cells
+  </title>
+  <para>ColumnFamilies can optionally keep deleted cells. That means deleted cells can still be retrieved with
+  <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html">Get</link> or
+  <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html">Scan</link> operations,
+  as long these operations have a time range specified that ends before the timestamp of any delete that would affect the cells.
+  This allows for point in time queries even in the presence of deletes.
+  </para>
+  <para>
+  Deleted cells are still subject to TTL and there will never be more than "maximum number of versions" deleted cells.
+  A new "raw" scan options returns all deleted rows and the delete markers.
+  </para>
+  <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> for more information.
+  </para>
+  </section>
+  <section xml:id="secondary.indexes">
+  <title>
+  Secondary Indexes and Alternate Query Paths
+  </title>
+  <para>This section could also be titled "what if my table rowkey looks like <emphasis>this</emphasis> but I also want to query my table like <emphasis>that</emphasis>."
+  A common example on the dist-list is where a row-key is of the format "user-timestamp" but there are are reporting requirements on activity across users for certain 
+  time ranges.  Thus, selecting by user is easy because it is in the lead position of the key, but time is not.
+  </para>
+  <para>There is no single answer on the best way to handle this because it depends on...
+   <itemizedlist>
+       <listitem>Number of users</listitem>  
+       <listitem>Data size and data arrival rate</listitem>
+       <listitem>Flexibility of reporting requirements (e.g., completely ad-hoc date selection vs. pre-configured ranges) </listitem>  
+       <listitem>Desired execution speed of query (e.g., 90 seconds may be reasonable to some for an ad-hoc report, whereas it may be too long for others) </listitem>  
+   </itemizedlist>
+   ... and solutions are also influenced by the size of the cluster and how much processing power you have to throw at the solution.  
+   Common techniques are in sub-sections below.  This is a comprehensive, but not exhaustive, list of approaches.   
+  </para>
+  <para>It should not be a surprise that secondary indexes require additional cluster space and processing.  
+  This is precisely what happens in an RDBMS because the act of creating an alternate index requires both space and processing cycles to update.  RBDMS products
+  are more advanced in this regard to handle alternative index management out of the box.  However, HBase scales better at larger data volumes, so this is a feature trade-off. 
+  </para>
+  <para>Pay attention to <xref linkend="performance"/> when implementing any of these approaches.</para>
+  <para>Additionally, see the David Butler response in this dist-list thread <link xlink:href="http://search-hadoop.com/m/nvbiBp2TDP/Stargate%252Bhbase&amp;subj=Stargate+hbase">HBase, mail # user - Stargate+hbase</link>
+   </para>
+    <section xml:id="secondary.indexes.filter">
+      <title>
+       Filter Query
+      </title>
+      <para>Depending on the case, it may be appropriate to use <xref linkend="client.filter"/>.  In this case, no secondary index is created.
+      However, don't try a full-scan on a large table like this from an application (i.e., single-threaded client).
+      </para>
+    </section>
+    <section xml:id="secondary.indexes.periodic">
+      <title>
+       Periodic-Update Secondary Index
+      </title>
+      <para>A secondary index could be created in an other table which is periodically updated via a MapReduce job.  The job could be executed intra-day, but depending on 
+      load-strategy it could still potentially be out of sync with the main data table.</para>
+      <para>See <xref linkend="mapreduce.example.readwrite"/> for more information.</para>
+    </section>
+    <section xml:id="secondary.indexes.dualwrite">
+      <title>
+       Dual-Write Secondary Index
+      </title>
+      <para>Another strategy is to build the secondary index while publishing data to the cluster (e.g., write to data table, write to index table). 
+      If this is approach is taken after a data table already exists, then bootstrapping will be needed for the secondary index with a MapReduce job (see <xref linkend="secondary.indexes.periodic"/>).</para>
+    </section>
+    <section xml:id="secondary.indexes.summary">
+      <title>
+       Summary Tables
+      </title>
+      <para>Where time-ranges are very wide (e.g., year-long report) and where the data is voluminous, summary tables are a common approach.
+      These would be generated with MapReduce jobs into another table.</para>
+      <para>See <xref linkend="mapreduce.example.summary"/> for more information.</para>
+    </section>
+    <section xml:id="secondary.indexes.coproc">
+      <title>
+       Coprocessor Secondary Index
+      </title>
+      <para>Coprocessors act like RDBMS triggers.  These are currently on TRUNK.
+      </para>
+    </section>
+  </section>
+  <section xml:id="schema.smackdown"><title>Schema Design Smackdown</title>
+    <para>This section will describe common schema design questions that appear on the dist-list.  These are 
+    general guidelines and not laws - each application must consider it's own needs.  
+    </para>
+    <section xml:id="schema.smackdown.rowsversions"><title>Rows vs. Versions</title>
+      <para>A common question is whether one should prefer rows or HBase's built-in-versioning.  The context is typically where there are
+      "a lot" of versions of a row to be retained (e.g., where it is significantly above the HBase default of 3 max versions).  The 
+      rows-approach would require storing a timstamp in some portion of the rowkey so that they would not overwite with each successive update.
+      </para>
+      <para>Preference:  Rows (generally speaking).
+      </para>
+    </section>
+    <section xml:id="schema.smackdown.rowscols"><title>Rows vs. Columns</title>

[... 943 lines stripped ...]


Mime
View raw message