hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Trivial Update of "Hbase/HbaseArchitecture" by JimKellerman
Date Tue, 06 Feb 2007 16:25:13 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by JimKellerman:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture

The comment on the change is:
Comment on comments

------------------------------------------------------------------------------
  
  I think Hbase should be compact (space-efficient), fast and should be able to manage high-demand
load. It should be able to handle sparse tables efficiently.
  So, for wide and sparse data, Hbase must store data by columns like C-Store does.
+ 
+   ''I agree. But let's not get ahead of ourselves here. I only posted the conceptual view
last night. There is no part of the document that discusses how the data is physically organized.
I was going to work on that today. Patience.'' -- JimKellerman
+ 
  A column-oriented system handles NULLs more easily with significantly smaller performance
overhead,
  and supports both Horizontal and Vertical Parallel Processing.
+ 
+  ''Bigtable (and Hbase) do not even have to store nulls. If there is no value for a particular
key, then an empty or null value will be returned'' -- JimKellerman
  
  Let's consider the following case:
  You may be familiar to RDF(Resource Description Framework) Storage from W3C, which is
@@ -283, +288 @@

   * Row/column space can be sparse
   * Columns are in the form of (family: optional qualifier). This is a RDF Properties 
   * Columns have type information  
+ 
+  ''In both Bigtable, and Hbase, there is no notion of type. Keys and values in Bigtable
are arbitrary strings. For Hbase, we are considering that values be an arbitrary byte array.
Why? Because Strings have an encoding associated with them. Unless you store the original
encoding of a value, you have no way to decode it back into the same encoding.'' -- JimKellerman
+ 
   * Because of the design of the system, columns are easy to create (and are created implicitly)

+ 
+  ''In Bigtable, columns are easy to create but they require administration priviliges (Access
Control Lists control who can manipulate the schema. Hbase will follow this metaphor.'' --
JimKellerman
+ 
   * Column families can be split into locality groups (Ontologies!) 
  
  Let's assume a large amount of RDF documents are stored in the system.
  And then, vertical(column) data set by one of RDF properties can be read fast from Table,
because it is column-stored.
  Please let me know if you don't agree with me.
  
-   ''Absolutely. But let's not get ahead of ourselves here. I only posted the conceptual
view last night. There is no part of the document that discusses how the data is physically
organized. I was going to work on that today. Patience.'' -- JimKellerman
  
  ----
  

Mime
View raw message