hbase-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mi...@apache.org
Subject git commit: HBASE-11986 Addendum - Added hbase_mob.xml
Date Thu, 18 Sep 2014 20:58:55 GMT
Repository: hbase
Updated Branches:
  refs/heads/hbase-11339 7dbd828a9 -> ac65b1fb7


HBASE-11986 Addendum - Added hbase_mob.xml


Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/ac65b1fb
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/ac65b1fb
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/ac65b1fb

Branch: refs/heads/hbase-11339
Commit: ac65b1fb756cb3d8cb1df117ed4a7d056b122afe
Parents: 7dbd828
Author: Misty Stanley-Jones <mstanleyjones@cloudera.com>
Authored: Fri Sep 19 06:57:31 2014 +1000
Committer: Misty Stanley-Jones <mstanleyjones@cloudera.com>
Committed: Fri Sep 19 06:58:41 2014 +1000

----------------------------------------------------------------------
 src/main/docbkx/hbase_mob.xml | 227 +++++++++++++++++++++++++++++++++++++
 1 file changed, 227 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hbase/blob/ac65b1fb/src/main/docbkx/hbase_mob.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/hbase_mob.xml b/src/main/docbkx/hbase_mob.xml
new file mode 100644
index 0000000..54672ad
--- /dev/null
+++ b/src/main/docbkx/hbase_mob.xml
@@ -0,0 +1,227 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<chapter version="5.0" xml:id="mob" xmlns="http://docbook.org/ns/docbook"
+    xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xi="http://www.w3.org/2001/XInclude"
+    xmlns:svg="http://www.w3.org/2000/svg" xmlns:m="http://www.w3.org/1998/Math/MathML"
+    xmlns:html="http://www.w3.org/1999/xhtml" xmlns:db="http://docbook.org/ns/docbook">
+    <!--
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+-->
+
+    <title>HBase Medium Object (MOB) Storage</title>
+    <para>Data comes in many sizes, and saving all of your data in HBase, including
binary data such
+        as images and documents, is ideal. HBase can technically handle binary objects with
cells
+        that are up to 10MB in size. However, HBase's normal read and write paths are optimized
for
+        values smaller than 100KB in size. When HBase deals with large numbers of values
up to 10MB,
+        referred to here as <firstterm>medium objects</firstterm>, or <firstterm>MOBs</firstterm>,
+        performance is degraded due to write amplification caused by splits and compactions.
HBase
+        2.0+ adds support for better managing large numbers of MOBs while maintaining performance,
+        consistency, and low operational overhead. MOB support is provided by the work done
in <link
+            xlink:href="https://issues.apache.org/jira/browse/HBASE-11339"
+        >HBASE-11339</link>.</para>
+
+    <para>To take advantage of MOB, you need to use HFile version 3. Optionally, configure
the MOB
+        file reader's cache settings for each RegionServer (see <xref linkend="mob.cache.configure"
+        />), then configure specific columns to hold MOB data. Currently, you also need
to configure
+        a periodic re-optimization of MOB data layout, but this requirement is expected to
be
+        removed at a later date.</para>
+    <para>Client code does not need to change to take advantage of HBase MOB support.
The feature is
+        transparent to the client.</para>
+    
+    <section>
+        <title>Limitations of MOB Functionality</title>
+        <para>Work on HBase MOB is ongoing. Work is needed for support for snapshots
(<link
+                xlink:href="https://issues.apache.org/jira/browse/HBASE-11645">HBASE-11645</link>),
+            metrics (<link xlink:href="https://issues.apache.org/jira/browse/HBASE-11683"
+                >HBASE-11683</link>), and a native compaction mechanism (<link
+                xlink:href="https://issues.apache.org/jira/browse/HBASE-11861"
+            >HBASE-11861)</link>.</para>
+    </section>
+    
+    <section>
+        <title>Configure Columns for MOB</title>
+        <para>You can configure columns to support MOB during table creation or alteration,
either
+            in HBase Shell or via the Java API. The two relevant properties are the boolean
+                <code>IS_MOB</code> and the <code>MOB_THRESHOLD</code>,
which is the number of bytes
+            at which an object is considered to be a MOB. Only <code>IS_MOB</code>
is required. If
+            you do not specify the <code>MOB_THRESHOLD</code>, the default threshold
value of 100 kb
+            is used.</para>
+        <example>
+            <title>Configure a Column for MOB Using HBase Shell</title>
+            <screen>
+hbase> create 't1', 'f1', {IS_MOB => true, MOB_THRESHOLD => 102400}
+hbase> alter ‘t1′, {NAME => ‘f1', IS_MOB => true, MOB_THRESHOLD => 102400}
+            </screen>
+        </example>
+        <example>
+            <title>Configure a Column for MOB Using the API</title>
+            <programlisting language="java">
+...
+HColumnDescriptor hcd = new HColumnDescriptor(“f”);
+hcd.setValue(MobConstants.IS_MOB, Bytes.toBytes(Boolean.TRUE));
+...
+HColumnDescriptor hcd;
+hcd.setValue(MobConstants.MOB_THRESHOLD, Bytes.toBytes(102400L);
+...
+            </programlisting>
+        </example>
+    </section>
+
+    <section>
+        <title>Testing MOB</title>
+        <para>The utility <command>org.apache.hadoop.hbase.IntegrationTestIngestMOB</command>
is
+            provided to assist with testing the MOB feature. The utility is run as follows:</para>
+        <screen>$ <userinput>sudo -u hbase hbase org.apache.hadoop.hbase.IntegrationTestIngestMOB
\
+            -threshold 100*1024 \
+            -minMobDataSize 100*1024*4/5 \
+            -maxMobDataSize 100*1024*50</userinput></screen>
+        <itemizedlist>
+            <listitem>
+                <para><literal>threshold</literal> is the threshold at
which cells are considered to
+                    be MOBs. The default is 100 kb.</para>
+            </listitem>
+            <listitem>
+                <para><literal>minMobDataSize</literal> is the minimum
value for the size of MOB
+                    data. The default is 80 kb.</para>
+            </listitem>
+            <listitem>
+                <para><literal>maxMobDataSize</literal> is the maximum
value for the size of MOB
+                    data. The default is 5 MB.</para>
+            </listitem>
+        </itemizedlist>
+    </section>
+
+    <section>
+        <title>Set Up MOB Re-Optimization Tasks</title>
+        <para>The MOB feature introduces a new read and write path to HBase and currently
requires
+            an external tool, the <command>sweeper</command> tool, for housekeeping
and
+            optimization. The <command>sweeper</command> tool uses MapReduce
to coalesce small MOB
+            files or MOB files with many deletions or updates</para>
+
+        <procedure>
+            <title>Configure and Run the <command>sweeper</command> Tool</title>
+            <step>
+                <para>First, configure the <command>sweeper</command>'s
properties in the
+                    RegionServer's <filename>hbase-site.xml</filename> file.
Adjust these properties
+                    to suit your environment.</para>
+                <programlisting language="xml"><![CDATA[
+<property>
+    <name>hbase.mob.sweep.tool.compaction.ratio</name>
+    <value>0.5f</value>
+    <description>
+      If there're too many cells deleted in a mob file, it's regarded
+      as an invalid file and needs to be merged.
+      If existingCellsSize/mobFileSize is less than ratio, it's regarded
+      as an invalid file. The default value is 0.5f.
+    </description>
+</property>
+<property>
+    <name>hbase.mob.sweep.tool.compaction.mergeable.size</name>
+    <value>134217728</value>
+    <description>
+      If the size of a mob file is less than this value, it's regarded as a small
+      file and needs to be merged. The default value is 128MB.
+    </description>
+</property>
+<property>
+    <name>hbase.mob.sweep.tool.compaction.memstore.flush.size</name>
+    <value>134217728</value>
+    <description>
+      The flush size for the memstore used by sweep job. Each sweep reducer owns such a memstore.
+      The default value is 128MB.
+    </description>
+</property>
+<property>
+    <name>hbase.mob.cleaner.interval</name>
+    <value>86400000</value>
+    <description>
+      The period that ExpiredMobFileCleaner runs. The unit is millisecond.
+      The default value is one day.
+    </description>
+</property>]]>
+                </programlisting>
+            </step>
+            <step>
+                <para>Next, add the HBase install directory, <envar>$HBASE_HOME/*</envar>,
and HBase
+                    library directory to <filename>yarn-site.xml</filename> Adjust
this example to
+                    suit your environment.</para>
+                <programlisting language="xml"><![CDATA[
+<property>
+    <description>Classpath for typical applications.</description>
+    <name>yarn.application.classpath</name>
+    <value>
+        $HADOOP_CONF_DIR
+        $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*
+        $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
+        $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*
+        $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/* 
+        $HBASE_HOME/*, $HBASE_HOME/lib/*
+    </value>
+</property>]]>
+                </programlisting>
+            </step>
+            <step>
+                <para>Finally, run the <command>sweeper</command> tool
for each column which is
+                    configured for MOB..</para>
+                <screen>$ <userinput>org.apache.hadoop.hbase.mob.compactions.Sweeper
\
+                    <replaceable>tableName</replaceable> \
+                    <replaceable>familyName</replaceable></userinput></screen>
+            </step>
+        </procedure>
+    </section>
+    <section xml:id="mob.cache.configure">
+        <title>Configure the MOB Cache</title>
+        <para>Because there can be a large number of MOB files at any time, as compared
to the
+            number of HFiles, MOB files are not always kept open. The MOB file reader cache
is a LRU
+            cache which keeps the most recently used MOB files open. To configure the MOB
file
+            reader's cache on each RegionServer, add the following properties to the RegionServer's
+                <filename>hbase-site.xm</filename>l, customize the configuration
to suit your
+            environment, and restart or rolling restart the RegionServer.</para>
+        <programlisting language="xml"><![CDATA[
+<property>
+    <name>hbase.mob.file.cache.size</name>
+    <value>1000</value>
+    <description>
+      Number of opened file handlers to cache.
+      A larger value will benefit reads by provinding more file handlers per mob
+      file cache and would reduce frequent file opening and closing.
+      However, if this is set too high, this could lead to a "too many opened file handers"
+      The default value is 1000.
+    </description>
+</property>
+<property>
+    <name>hbase.mob.cache.evict.period</name>
+    <value>3600</value>
+    <description>
+      The amount of time in seconds before the mob cache evicts cached mob files.
+      The default value is 3600 seconds.
+    </description>
+</property>
+<property>
+    <name>hbase.mob.cache.evict.remain.ratio</name>
+    <value>0.5f</value>
+    <description>
+      The ratio (between 0.0 and 1.0) of files that remains cached after an eviction
+      is triggered when the number of cached mob files exceeds the hbase.mob.file.cache.size.
+      The default value is 0.5f.
+    </description>
+</property>
+]]>
+        </programlisting>
+    </section>
+</chapter>
\ No newline at end of file


Mime
View raw message