hbase-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mi...@apache.org
Subject hbase git commit: HBASE-11339 Converted hbase_mob.xml to Asciidoc and added it to the Asciidoc TOC
Date Fri, 27 Feb 2015 02:15:08 GMT
Repository: hbase
Updated Branches:
  refs/heads/HBASE-11339 [created] a1e9ce3d8


HBASE-11339 Converted hbase_mob.xml to Asciidoc and added it to the Asciidoc TOC


Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/a1e9ce3d
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/a1e9ce3d
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/a1e9ce3d

Branch: refs/heads/HBASE-11339
Commit: a1e9ce3d877035a6e21aab6df8eccd8e959e92dc
Parents: 85bcec3
Author: Misty Stanley-Jones <mstanleyjones@cloudera.com>
Authored: Fri Feb 27 12:14:50 2015 +1000
Committer: Misty Stanley-Jones <mstanleyjones@cloudera.com>
Committed: Fri Feb 27 12:14:50 2015 +1000

----------------------------------------------------------------------
 src/main/asciidoc/_chapters/hbase_mob.adoc | 120 +++++++++++++
 src/main/asciidoc/book.adoc                |   1 +
 src/main/docbkx/hbase_mob.xml              | 226 ------------------------
 3 files changed, 121 insertions(+), 226 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hbase/blob/a1e9ce3d/src/main/asciidoc/_chapters/hbase_mob.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/hbase_mob.adoc b/src/main/asciidoc/_chapters/hbase_mob.adoc
new file mode 100644
index 0000000..8dea211
--- /dev/null
+++ b/src/main/asciidoc/_chapters/hbase_mob.adoc
@@ -0,0 +1,120 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[[hbase_mob]]
+== Storing Medium-sized Objects (MOB)
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+:toc: left
+:source-language: java
+
+Data comes in many sizes, and saving all of your data in HBase, including binary data such
as images and documents, is ideal. HBase can technically handle binary objects with cells
that are up to 10MB in size. However, HBase's normal read and write paths are optimized for
values smaller than 100KB in size. When HBase deals with large numbers of values up to 10MB,
referred to here as medium objects, or MOBs, performance is degraded due to write amplification
caused by splits and compactions. HBase ***FIX_VERSION_NUMBER*** adds support for better managing
large numbers of MOBs while maintaining performance, consistency, and low operational overhead.
MOB support is provided by the work done in link:https://issues.apache.org/jira/browse/HBASE-11339[HBASE-11339].
+
+To take advantage of MOB, you need to use <<hfilev3,HFile version 3>>. Optionally,
configure the MOB file reader's cache settings for each RegionServer (see <<mob.cache.configure>>),
then configure specific columns to hold MOB data.
+
+Client code does not need to change to take advantage of HBase MOB support. The feature is
transparent to the client.
+
+=== Configuring Columns for MOB
+
+You can configure columns to support MOB during table creation or alteration, either in HBase
Shell or via the Java API. The two relevant properties are the boolean `IS_MOB` and the `MOB_THRESHOLD`,
which is the number of bytes at which an object is considered to be a MOB. Only `IS_MOB` is
required. If you do not specify the `MOB_THRESHOLD`, the default threshold value of 100 kb
is used.
+
+.Configure a Column for MOB Using HBase Shell
+====
+----
+hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400}
+hbase> alter 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400}
+----
+====
+
+.Configure a Column for MOB Using the Java API
+====
+[source,java]
+----
+...
+HColumnDescriptor hcd = new HColumnDescriptor(“f”);
+hcd.setMobEnabled(true);
+...
+hcd.setMobThreshold(102400L);
+...        
+----
+====
+
+
+=== Testing MOB
+
+The utility `org.apache.hadoop.hbase.IntegrationTestIngestMOB` is provided to assist with
testing the MOB feature. The utility is run as follows:
+[source,bash]
+----
+$ sudo -u hbase hbase org.apache.hadoop.hbase.IntegrationTestIngestMOB \
+            -threshold 102400 \
+            -minMobDataSize 512 \
+            -maxMobDataSize 5120
+----
+
+* `*threshold*` is the threshold at which cells are considered to be MOBs. The default is
1 kB, expressed in bytes.
+* `*minMobDataSize*` is the minimum value for the size of MOB data. The default is 512 B,
expressed in bytes.
+* `*maxMobDataSize*` is the maximum value for the size of MOB data. The default is 5 kB,
expressed in bytes.
+
+
+[[mob.cache.configure]]
+=== Configuring the MOB Cache
+
+
+Because there can be a large number of MOB files at any time, as compared to the number of
HFiles, MOB files are not always kept open. The MOB file reader cache is a LRU cache which
keeps the most recently used MOB files open. To configure the MOB file reader's cache on each
RegionServer, add the following properties to the RegionServer's `hbase-site.xml`, customize
the configuration to suit your environment, and restart or rolling restart the RegionServer.
+
+.Example MOB Cache Configuration
+====
+[source,xml]
+----
+<property>
+    <name>hbase.mob.file.cache.size</name>
+    <value>1000</value>
+    <description>
+      Number of opened file handlers to cache.
+      A larger value will benefit reads by provinding more file handlers per mob
+      file cache and would reduce frequent file opening and closing.
+      However, if this is set too high, this could lead to a "too many opened file handers"
+      The default value is 1000.
+    </description>
+</property>
+<property>
+    <name>hbase.mob.cache.evict.period</name>
+    <value>3600</value>
+    <description>
+      The amount of time in seconds before the mob cache evicts cached mob files.
+      The default value is 3600 seconds.
+    </description>
+</property>
+<property>
+    <name>hbase.mob.cache.evict.remain.ratio</name>
+    <value>0.5f</value>
+    <description>
+      The ratio (between 0.0 and 1.0) of files that remains cached after an eviction
+      is triggered when the number of cached mob files exceeds the hbase.mob.file.cache.size.
+      The default value is 0.5f.
+    </description>
+</property>
+----
+====
+

http://git-wip-us.apache.org/repos/asf/hbase/blob/a1e9ce3d/src/main/asciidoc/book.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/book.adoc b/src/main/asciidoc/book.adoc
index 790a23c..0ea9a6a 100644
--- a/src/main/asciidoc/book.adoc
+++ b/src/main/asciidoc/book.adoc
@@ -51,6 +51,7 @@ include::_chapters/schema_design.adoc[]
 include::_chapters/mapreduce.adoc[]
 include::_chapters/security.adoc[]
 include::_chapters/architecture.adoc[]
+include::_chapters/hbase_mob.adoc[]
 include::_chapters/hbase_apis.adoc[]
 include::_chapters/external_apis.adoc[]
 include::_chapters/thrift_filter_language.adoc[]

http://git-wip-us.apache.org/repos/asf/hbase/blob/a1e9ce3d/src/main/docbkx/hbase_mob.xml
----------------------------------------------------------------------
diff --git a/src/main/docbkx/hbase_mob.xml b/src/main/docbkx/hbase_mob.xml
deleted file mode 100644
index c3fb0eb..0000000
--- a/src/main/docbkx/hbase_mob.xml
+++ /dev/null
@@ -1,226 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<chapter version="5.0" xml:id="mob" xmlns="http://docbook.org/ns/docbook"
-    xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xi="http://www.w3.org/2001/XInclude"
-    xmlns:svg="http://www.w3.org/2000/svg" xmlns:m="http://www.w3.org/1998/Math/MathML"
-    xmlns:html="http://www.w3.org/1999/xhtml" xmlns:db="http://docbook.org/ns/docbook">
-    <!--
-/**
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
--->
-
-    <title>HBase Medium Object (MOB) Storage</title>
-    <para>Data comes in many sizes, and saving all of your data in HBase, including
binary data such
-        as images and documents, is ideal. HBase can technically handle binary objects with
cells
-        that are up to 10MB in size. However, HBase's normal read and write paths are optimized
for
-        values smaller than 100KB in size. When HBase deals with large numbers of values
up to 10MB,
-        referred to here as <firstterm>medium objects</firstterm>, or <firstterm>MOBs</firstterm>,
-        performance is degraded due to write amplification caused by splits and compactions.
HBase
-        2.0+ adds support for better managing large numbers of MOBs while maintaining performance,
-        consistency, and low operational overhead. MOB support is provided by the work done
in <link
-            xlink:href="https://issues.apache.org/jira/browse/HBASE-11339"
-        >HBASE-11339</link>.</para>
-
-    <para>To take advantage of MOB, you need to use HFile version 3. Optionally, configure
the MOB
-        file reader's cache settings for each RegionServer (see <xref linkend="mob.cache.configure"
-        />), then configure specific columns to hold MOB data. Currently, you also need
to configure
-        a periodic re-optimization of MOB data layout, but this requirement is expected to
be
-        removed at a later date.</para>
-    <para>Client code does not need to change to take advantage of HBase MOB support.
The feature is
-        transparent to the client.</para>
-    
-    <section>
-        <title>Limitations of MOB Functionality</title>
-        <para>Work on HBase MOB is ongoing. Work is needed for support for snapshots
(<link
-                xlink:href="https://issues.apache.org/jira/browse/HBASE-11645">HBASE-11645</link>),
-            metrics (<link xlink:href="https://issues.apache.org/jira/browse/HBASE-11683"
-                >HBASE-11683</link>), and a native compaction mechanism (<link
-                xlink:href="https://issues.apache.org/jira/browse/HBASE-11861"
-            >HBASE-11861)</link>.</para>
-    </section>
-    
-    <section>
-        <title>Configure Columns for MOB</title>
-        <para>You can configure columns to support MOB during table creation or alteration,
either
-            in HBase Shell or via the Java API. The two relevant properties are the boolean
-                <code>IS_MOB</code> and the <code>MOB_THRESHOLD</code>,
which is the number of bytes
-            at which an object is considered to be a MOB. Only <code>IS_MOB</code>
is required. If
-            you do not specify the <code>MOB_THRESHOLD</code>, the default threshold
value of 100 kb
-            is used.</para>
-        <example>
-            <title>Configure a Column for MOB Using HBase Shell</title>
-            <screen>
-hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400}
-hbase> alter 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400}
-            </screen>
-        </example>
-        <example>
-            <title>Configure a Column for MOB Using the API</title>
-            <programlisting language="java">
-...
-HColumnDescriptor hcd = new HColumnDescriptor(“f”);
-hcd.setMobEnabled(true);
-...
-hcd.setMobThreshold(102400L);
-...
-            </programlisting>
-        </example>
-    </section>
-    
-    <section>
-        <title>Testing MOB</title>
-        <para>The utility <command>org.apache.hadoop.hbase.IntegrationTestIngestMOB</command>
is
-            provided to assist with testing the MOB feature. The utility is run as follows:</para>
-        <screen>$ <userinput>sudo -u hbase hbase org.apache.hadoop.hbase.IntegrationTestIngestMOB
\
-            -threshold 100*1024 \
-            -minMobDataSize 100*1024*4/5 \
-            -maxMobDataSize 100*1024*50</userinput></screen>
-        <itemizedlist>
-            <listitem>
-                <para><literal>threshold</literal> is the threshold at
which cells are considered to
-                    be MOBs. The default is 1 kB.</para>
-            </listitem>
-            <listitem>
-                <para><literal>minMobDataSize</literal> is the minimum
value for the size of MOB
-                    data. The default is 512 B.</para>
-            </listitem>
-            <listitem>
-                <para><literal>maxMobDataSize</literal> is the maximum
value for the size of MOB
-                    data. The default is 5 kB.</para>
-            </listitem>
-        </itemizedlist>
-    </section>
-
-    <section>
-        <title>Set Up MOB Re-Optimization Tasks</title>
-        <para>The MOB feature introduces a new read and write path to HBase and currently
requires
-            an external tool, the <command>sweeper</command> tool, for housekeeping
and
-            optimization. The <command>sweeper</command> tool uses MapReduce
to coalesce small MOB
-            files or MOB files with many deletions or updates</para>
-
-        <procedure>
-            <title>Configure and Run the <command>sweeper</command> Tool</title>
-            <step>
-                <para>First, configure the <command>sweeper</command>'s
properties in the
-                    RegionServer's <filename>hbase-site.xml</filename> file.
Adjust these properties
-                    to suit your environment.</para>
-                <programlisting language="xml"><![CDATA[
-<property>
-    <name>hbase.mob.sweep.tool.compaction.ratio</name>
-    <value>0.5f</value>
-    <description>
-      If there are too many cells deleted in a mob file, it's regarded
-      as an invalid file and needs to be merged.
-      If existingCellsSize/mobFileSize is less than ratio, it's regarded
-      as an invalid file. The default value is 0.5f.
-    </description>
-</property>
-<property>
-    <name>hbase.mob.sweep.tool.compaction.mergeable.size</name>
-    <value>134217728</value>
-    <description>
-      If the size of a mob file is less than this value, it's regarded as a small
-      file and needs to be merged. The default value is 128MB.
-    </description>
-</property>
-<property>
-    <name>hbase.mob.sweep.tool.compaction.memstore.flush.size</name>
-    <value>134217728</value>
-    <description>
-      The flush size for the memstore used by sweep job. Each sweep reducer owns such a memstore.
-      The default value is 128MB.
-    </description>
-</property>
-<property>
-    <name>hbase.mob.cleaner.interval</name>
-    <value>86400000</value>
-    <description>
-      The period that ExpiredMobFileCleaner runs. The unit is millisecond.
-      The default value is one day.
-    </description>
-</property>]]>
-                </programlisting>
-            </step>
-            <step>
-                <para>Next, add the HBase install directory, <envar>$HBASE_HOME/*</envar>,
and HBase
-                    library directory to <filename>yarn-site.xml</filename> Adjust
this example to
-                    suit your environment.</para>
-                <programlisting language="xml"><![CDATA[
-<property>
-    <description>Classpath for typical applications.</description>
-    <name>yarn.application.classpath</name>
-    <value>
-        $HADOOP_CONF_DIR,
-        $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
-        $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
-        $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
-        $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,
-        $HBASE_HOME/*, $HBASE_HOME/lib/*
-    </value>
-</property>]]>
-                </programlisting>
-            </step>
-            <step>
-                <para>Finally, run the <command>sweeper</command> tool
for each column which is
-                    configured for MOB..</para>
-                <screen>$ <userinput>org.apache.hadoop.hbase.mob.compactions.Sweeper
\
-                    <replaceable>tableName</replaceable> \
-                    <replaceable>familyName</replaceable></userinput></screen>
-            </step>
-        </procedure>
-    </section>
-    <section xml:id="mob.cache.configure">
-        <title>Configure the MOB Cache</title>
-        <para>Because there can be a large number of MOB files at any time, as compared
to the
-            number of HFiles, MOB files are not always kept open. The MOB file reader cache
is a LRU
-            cache which keeps the most recently used MOB files open. To configure the MOB
file
-            reader's cache on each RegionServer, add the following properties to the RegionServer's
-                <filename>hbase-site.xm</filename>l, customize the configuration
to suit your
-            environment, and restart or rolling restart the RegionServer.</para>
-        <programlisting language="xml"><![CDATA[
-<property>
-    <name>hbase.mob.file.cache.size</name>
-    <value>1000</value>
-    <description>
-      Number of opened file handlers to cache.
-      A larger value will benefit reads by provinding more file handlers per mob
-      file cache and would reduce frequent file opening and closing.
-      However, if this is set too high, this could lead to a "too many opened file handers"
-      The default value is 1000.
-    </description>
-</property>
-<property>
-    <name>hbase.mob.cache.evict.period</name>
-    <value>3600</value>
-    <description>
-      The amount of time in seconds before the mob cache evicts cached mob files.
-      The default value is 3600 seconds.
-    </description>
-</property>
-<property>
-    <name>hbase.mob.cache.evict.remain.ratio</name>
-    <value>0.5f</value>
-    <description>
-      The ratio (between 0.0 and 1.0) of files that remains cached after an eviction
-      is triggered when the number of cached mob files exceeds the hbase.mob.file.cache.size.
-      The default value is 0.5f.
-    </description>
-</property>
-]]>
-        </programlisting>
-    </section>
-</chapter>
\ No newline at end of file


Mime
View raw message