directory-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From conflue...@apache.org
Subject [CONF] Apache Directory SandBox > HBase Prototype
Date Wed, 20 Jan 2010 00:07:00 GMT
<html>
<head>
    <base href="http://cwiki.apache.org/confluence">
            <link rel="stylesheet" href="/confluence/s/1519/1/5/_/styles/combined.css?spaceKey=DIRxSBOX&amp;forWysiwyg=true"
type="text/css">
    </head>
<body style="background-color: white" bgcolor="white">
<div id="pageContent">
<div id="notificationFormat">
<div class="wiki-content">
<div class="email">
     <h2><a href="http://cwiki.apache.org/confluence/display/DIRxSBOX/HBase+Prototype">HBase
Prototype</a></h2>
     <h4>Page <b>edited</b> by             <a href="http://cwiki.apache.org/confluence/display/~seelmann">Stefan
Seelmann</a>
    </h4>
     
          <br/>
     <div class="notificationGreySide">
         <h1><a name="HBasePrototype-HBasePrototype"></a>HBase Prototype</h1>

<p>This page describes a partition which stores data in <a href="http://hadoop.apache.org/hbase/"
rel="nofollow">Hadoop's HBase</a> database.</p>

<h2><a name="HBasePrototype-Introduction"></a>Introduction</h2>

<p>The HBase partition is implemented as <a href="/confluence/display/DIRxSRVx11/Xdbm+Partition+Design"
title="Xdbm Partition Design">XDBM Partition</a>. The main advantage to an implementation
of the simple Partition interface is that the powerful search engine can be used. However
instead of storing entries by their DN an hierarchical approach is used, inspired by <a
href="http://cwiki.apache.org/confluence/display/DIRxSRVx11/Xdbm+fast+modifyDn+proposal" rel="nofollow">http://cwiki.apache.org/confluence/display/DIRxSRVx11/Xdbm+fast+modifyDn+proposal</a>
and <a href="http://www.openldap.org/conf/odd-sfo-2003/howard-dev.pdf" rel="nofollow">http://www.openldap.org/conf/odd-sfo-2003/howard-dev.pdf</a>.</p>

<p>In contrast to the JDBM (or the LDIF) partition, the directory server (ApacheDS)
and the storage enging (HBase) don't run within the same JVM but are distributed over the
network and communicate via RPC calls. Hence it is important to reduce the communication between
ApacheDS and HBase to a minimum.</p>

<h2><a name="HBasePrototype-HBaseSchemaDesign"></a>HBase Schema Design</h2>

<p>The following format is used to illustrate the HBase table layout, timestamps are
omitted:</p>

<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
<pre class="code-java">
------------------------------------------------------------
| table name | family1              | family2              |
------------------------------------------------------------
| row1       | qualifier1 -&gt; value1 | qualifierA -&gt; valueA |
|            | qualifier2 -&gt; value2 | qualifierB -&gt; valueB |
------------------------------------------------------------
| row2       | qualifier1 -&gt; value1 | qualifierA -&gt; valueA |
|            | qualifier2 -&gt; value2 | qualifierB -&gt; valueB |
------------------------------------------------------------
</pre>
</div></div>

<h3><a name="HBasePrototype-MasterTable"></a>Master Table</h3>

<p>The master table stores the entries, one entry per row:</p>

<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
<pre class="code-java">
--------------------------------------------------------------------------------------------------
| master | treeInfo                              | upAttributes                          
       |
--------------------------------------------------------------------------------------------------
|   0    | sequence -&gt; 8216                      |                                
              |
--------------------------------------------------------------------------------------------------
|   1    | parentId -&gt; 0                         | objectClass0 -&gt; top     
                     |
|        | upRdn -&gt; o=sevenSeas                  | objectClass1 -&gt; organization
                 |
|        | normRdn -&gt; 2.5.4.10=sevenseas         | o0 -&gt; sevenSeas         
                     |
--------------------------------------------------------------------------------------------------
|   2    | parentId -&gt; 1                         | objectClass0 -&gt; top     
                     |
|        | upRdn -&gt; ou=people                    | objectClass1 -&gt; organizationalUnit
           |
|        | normRdn -&gt; 2.5.4.11=people            | ou0 -&gt; people           
                     |
--------------------------------------------------------------------------------------------------
|   3    | parentId -&gt; 1                         | objectClass0 -&gt; top     
                     |
|        | upRdn -&gt; ou=groups                    | objectClass1 -&gt; organizationalUnit
           |
|        | normRdn -&gt; 2.5.4.11=groups            | ou0 -&gt; groups           
                     |
--------------------------------------------------------------------------------------------------
|   6    | parentId -&gt; 2                         | objectClass0 -&gt; top     
                     |
|        | upRdn -&gt; cn=Horatio Hornblower        | objectClass1 -&gt; person  
                     |
|        | normRdn -&gt; 2.5.4.3=horatio hornblower | objectClass2 -&gt; organizationalPerson
         |
|        |                                       | objectClass3 -&gt; inetOrgPerson  
              |
|        |                                       | cn0 -&gt; Horatio Hornblower      
              |
|        |                                       | description0 -&gt; Capt. Horatio Hornblower,
R.N |
|        |                                       | givenName0 -&gt; Horatio          
              |
|        |                                       | sn0 -&gt; Hornblower              
              |
|        |                                       | uid0 -&gt; hhornblo               
              |
|        |                                       | mail0 -&gt; hhornblo@royalnavy.mod.uk
           |
|        |                                       | userPassword0 -&gt; &lt;bytes&gt;
                     |
--------------------------------------------------------------------------------------------------
| ...    |                                       |                                       
       |
--------------------------------------------------------------------------------------------------
</pre>
</div></div>

<p>The <b>row key</b> is a sequentially generated 8-byte long. An advantage
of using longs is compatibility with XDBM. A disadvantage is that sequential keys are not
optimal for distributing data and load balancing across data nodes.</p>

<p>The <b>treeInfo</b> column family stores hierarchical information:</p>
<ul>
	<li>Column <b>parentId</b> contains the row key of the parent entry.</li>
	<li>Column <b>upRdn</b> contains the user provided local name, relative
to the parent (normally the RDN; the suffix DN for the context entry). It is used to reconstruct
the entry's DN.</li>
	<li>Column <b>normRdn</b> contains the normalized local name, relative
to the parent (normally the RDN; DN for the context entry). This is just an optimization for
constructing the key of the tree table in order to avoid RDN normalization (see below).</li>
</ul>


<p>The <b>upAttributes</b> column family contains a map with all the attributes.
</p>
<ul>
	<li>The attribute description (type+options) is used as column qualifier.</li>
	<li>HBase stores one value per column. There are several workarounds how to store multiple
values. When using serialization or JSON format it won't be possible to access one value at
a time. Hence each value is stored in its own column and an additional index is added to the
column qualifier. This way each value can be read and written separately.</li>
	<li>The additional index is a zero-based 4-byte signed integer. To reconstruct the
user provided attribute description the last 4 bytes needs to be removed from the column qualifier.</li>
	<li>The values are stored as byte[].</li>
</ul>


<p>The row with key '0' is a special row, it's the virtual root. Its column 'treeInfo:sequence'
contains the row key sequence number. HBase provides an atomic increment-and-get operation
to obtain the next key for a new entry.</p>

<p>To retrieve the DN of an entry by its ID the entry's RDN and parent ID must be fetched.
As long as the parent ID is greather than 0 this step must be repeated for all parents. The
DN is the result of all concatenated RDNs.</p>

<p>It would be also possible to determine the ID for an DN, however a full table scan
would be necessary. For this reason a second table is available.</p>

<p>The master table contains all information needed to restore the data: reference to
the parent, user provided RDN, and user provided attributes.</p>

<p>Alternatives and Improvements:</p>
<ul>
	<li>It would be possible to store the serialized form of the server entry instead of
the attributes.</li>
	<li>It would be possible to store the serialized form of the RDN to avoid parsing.</li>
	<li>Different kind of attributes could be stored in separate column families (e.g.
binary attributes).</li>
	<li>Usage of entry UUID as row key. This helps to distribute the entries over all clusters
and may avoid hot spots.</li>
	<li>Add oneLevelCount and subLevelCount (currently stored in tree table) for faster
lookup of counts.</li>
	<li>Add normAttributes (currently stored in tree table) for faster lookup of reverse
index and evaluator.</li>
</ul>


<h3><a name="HBasePrototype-TreeTable"></a>Tree Table</h3>

<p>The tree table stores parent to child relationships.</p>

<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
<pre class="code-java">
--------------------------------------------------------------------------------------------------------------------
| tree                         | treeInfo              | normAttributes                  
                         |
--------------------------------------------------------------------------------------------------------------------
| 0,2.5.4.10=sevenseas         | id -&gt; 1               | 2.5.4.0=organization -&gt;
1                                 |
|                              | oneLevelCount -&gt; 4    | 2.5.4.0=top -&gt; 0  
                                       |
|                              | subLevelCount -&gt; 1583 | 2.5.4.10=sevenseas -&gt;
0                                   |
--------------------------------------------------------------------------------------------------------------------
| 1,2.5.4.11=people            | id -&gt; 2               | 2.5.4.0=organizationalunit
-&gt; 1                           |
|                              | oneLevelCount -&gt; 1254 | 2.5.4.0=top -&gt; 0  
                                       |
|                              | subLevelCount -&gt; 1254 | 2.5.4.11=people -&gt;
0                                      |
--------------------------------------------------------------------------------------------------------------------
| 1,2.5.4.11=groups            | id -&gt; 3               | 2.5.4.0=organizationalunit
-&gt; 1                           |
|                              | oneLevelCount -&gt; 2    | 2.5.4.0=top -&gt; 0  
                                       |
|                              | subLevelCount -&gt; 56   | 2.5.4.11=groups -&gt;
0                                      |
--------------------------------------------------------------------------------------------------------------------
| 2,2.5.4.3=horatio hornblower | id -&gt; 6               | 0.9.2342.19200300.100.1.1=hhornblo
-&gt; 0                   |
|                              | oneLevelCount -&gt; 0    | 0.9.2342.19200300.100.1.3=hhornblo@royalnavy.mod.uk
-&gt; 0  |
|                              | subLevelCount -&gt; 0    | 2.5.4.0=inetorgperson -&gt;
3                                |
|                              |                       | 2.5.4.0=organizationalperson -&gt;
2                         |
|                              |                       | 2.5.4.0=person -&gt; 1      
                                |
|                              |                       | 2.5.4.0=top -&gt; 0         
                                |
|                              |                       | 2.5.4.13=capt. horatio hornblower,
r.n -&gt; 0               |
|                              |                       | 2.5.4.3=horatio hornblower -&gt;
0                           |
|                              |                       | 2.5.4.35=&lt;bytes&gt; -&gt;
0                                     |
|                              |                       | 2.5.4.4=hornblower -&gt; 0  
                                |
|                              |                       | 2.5.4.42=horatio -&gt; 0    
                                |
--------------------------------------------------------------------------------------------------------------------
| ...                          |                       |                                 
                         |
--------------------------------------------------------------------------------------------------------------------
</pre>
</div></div>

<p>The <b>row key</b> is composed of the parent entry ID (8-byte long),
a comma, and the normalized RDN of an entry.</p>

<p>The <b>treeInfo</b> column family stores hierarchical information:</p>
<ul>
	<li>Column <b>id</b> contains the row key of the entry in the master table</li>
	<li>Column <b>oneLevelCount</b> tracks the number of immediate children.
It is used by the one level index. When adding or deleting an entry the oneLevelCounter of
the parent entry is incremented or decremented.</li>
	<li>Column <b>subLevelCount</b> tracks the number of all descendants. It
is used by the sub level index. When adding or deleting an entry the subLevelCount counters
of all parent entries are incremented or decremented.</li>
</ul>


<p>The <b>normAttributes</b> column family stores a map with all attributes
(indexed as well as unindexed) in normalized form. It is used for server-side filtering while
scanning. The qualifier is composed of the attribute OID, an equals character, and the attribute
value. The numeric values represent the 4-byte attribute index in the master table. </p>

<p>Using this table is easy to calculate the ID for an DN. The start row key can always
be composed using the partion's suffix: 0,&lt;suffix&gt;. From that row key the suffix
entry ID can be found in the treeInfo:id column. This ID and the next name component from
the DN are used to compose the next row key. This is reapeated till name components of the
DN are processed.</p>

<p>The table is also used for one-level and sub-level index cursors. To iterate over
all children of an entry 'X' a HBase scanner with start key 'X' and stop key 'X+1' is used.
For sub-level scans column column treeInfo:id can be used to setup the next scanner's start
and stop key. While walking the sub-level index the column treeInfo:oneLevelCount can be used
to determine if it is necessary to scan the next level.</p>

<p>While scanning it is also possible to use column family normAttributes for server-side
filtering. This is essential for unindexed searches as it is very expensive to load all entries
from the HBase cluster into ApacheDS and evaluate the filter there. Instead the LDAP filter
can be translated to an HBase filter and evaluated in the HBase cluster.</p>

<p>The table is also used as reverse indices, e.g. by evaluators.</p>

<p>Alternatives and Improvements:</p>
<ul>
	<li>Row keys may become long if custom AT or long values are used. Even a simple RDN
like '2.5.4.11=users' has 14 bytes. As the key is always calculated and never parsed back
it would be possible to shorten it. Possible strategies were to use some hash (MD5 fixed 16
bytes) or to substitute the OID with the short name.</li>
	<li>It isn't necessary to store objectClass:top</li>
</ul>


     </div>
     <div id="commentsSection" class="wiki-content pageSection">
       <div style="float: right;">
            <a href="http://cwiki.apache.org/confluence/users/viewnotifications.action"
class="grey">Change Notification Preferences</a>
       </div>

       <a href="http://cwiki.apache.org/confluence/display/DIRxSBOX/HBase+Prototype">View
Online</a>
       |
       <a href="http://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=9371750&revisedVersion=3&originalVersion=2">View
Change</a>
            </div>
</div>
</div>
</div>
</div>
</body>
</html>

Mime
View raw message