hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "Hbase/HbaseArchitecture" by JimKellerman
Date Wed, 30 May 2007 16:19:07 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by JimKellerman:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture

------------------------------------------------------------------------------
  them. Thanks!
  
  '''NEWS:'''
-  1. An update to the original HBase code has been committed to the Hadoop source tree, from
a patch attached to [http://issues.apache.org/jira/browse/HADOOP-1282 Hadoop Jira Issue 1282].
You can find the current HBase code in the [http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/contrib/hbase/
Hadoop SVN tree]
+  1. HBase is being updated frequently. The latest code can always be found in the [http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/contrib/hbase/
trunk of the Hadoop svn tree]. 
-  1. HBase now has its own component in the Hadoop Jira. Bug reports, contributions, etc.
should be tagged with the component '''contrib/hbase'''.
+  1. HBase now has its own component in the [https://issues.apache.org/jira/browse/HADOOP
Hadoop Jira]. Bug reports, contributions, etc. should be tagged with the component '''contrib/hbase'''.
  
  = Table of Contents =
  
@@ -495, +495 @@

  
  by [wiki:udanax Udanax] [[MailTo(webmaster AT SPAMFREE udanax DOT org)]]
  
- I think Hbase should be compact (space-efficient), fast and should be able to manage high-demand
load. It should be able to handle sparse tables efficiently.
+ I think Hbase should be compact (space-efficient), fast and should be able to manage high-demand
load. It should be able to handle sparse tables efficiently. So, for wide and sparse data,
Hbase must store data by columns like C-Store does.
- So, for wide and sparse data, Hbase must store data by columns like C-Store does.
  
-  ''I agree. But let's not get ahead of ourselves here. I only posted the conceptual view
last night. There is no part of the document that discusses how the data is physically organized.
I was going to work on that today. Patience.'' -- JimKellerman
+  ''I agree. See the sections on the [#conceptual conceptual data model] and the [#physical
physical data model]. -- JimKellerman 2007/05/30''
  
  A column-oriented system handles NULLs more easily with significantly smaller performance
overhead,
  and supports both Horizontal and Vertical Parallel Processing.
  
-  ''Bigtable (and Hbase) do not even have to store nulls. If there is no value for a particular
key, then an empty or null value will be returned'' -- JimKellerman
+  ''Bigtable (and Hbase) do not store nulls. If there is no value for a particular key, then
an empty or null value will be returned -- JimKellerman 2007/05/30''
  
  Let's consider the following case:
  You may be familiar to RDF(Resource Description Framework) Storage from W3C, which is
@@ -513, +512 @@

   * Columns are in the form of (family: optional qualifier). This is a RDF Properties 
   * Columns have type information  
  
-   ''In both Bigtable, and Hbase, there is no notion of type. Keys and values in Bigtable
are arbitrary strings. For Hbase, we are considering that values be an arbitrary byte array.''
+   ''In both Bigtable, and Hbase, there is no notion of type. Keys and values in Bigtable
are arbitrary strings. In Hbase, values are an arbitrary byte array. -- JimKellerman 2007/05/30''
- 
-   ''Why? Bigtable is written in C++ and std::string can contain an arbitrary byte sequence.
Hbase will be written in Java and in Java Strings have an encoding associated with them. Unless
you store the original encoding of a value, you have no way to decode it back into the same
encoding.'' -- JimKellerman
  
   * Because of the design of the system, columns are easy to create (and are created implicitly)

  
-   ''In Bigtable, columns are easy to create but they require administration priviliges (Access
Control Lists control who can manipulate the schema. Hbase will follow this metaphor.'' --
JimKellerman
+   ''In Bigtable, column families are easy to create but they require administration priviliges
(Access Control Lists control who can manipulate the schema. New column family members can
be created at any time. Hbase follows this metaphor. -- JimKellerman 2007/05/30''
  
   * Column families can be split into locality groups (Ontologies!) 
  

Mime
View raw message