hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "Hbase" by JimKellerman
Date Mon, 05 Feb 2007 20:00:00 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by JimKellerman:
http://wiki.apache.org/lucene-hadoop/Hbase

------------------------------------------------------------------------------
  Bigtable (and Hbase) provide a means for organizing and efficiently
  accessing these large data sets.
  
+ == Goals ==
+ 
+ Design (and subsequently implement) a structured storage system as
+ similar to Google's Bigtable as possible for the Hadoop environment.
+ 
+ === Non-Goals ===
+ 
+  * Gratuitous changes that are essentially "re-inventing the wheel" or are the result of
"not invented here".
+  * Until the first working version is completed, requests for additional features should
be posted at [wiki:Hbase/HbaseFeatureRequests Hbase Feature Requests] to prevent "feature
creep" or "one plus" requests that are not necessary in an initial release.
+  * Premature optimization. Once there is a working version, the system will be profiled
for hot spots.
+ 
  == Project Links ==
  
  [wiki:Hbase/HbaseArchitecture  Hbase Architecture - a work in progress]
@@ -48, +59 @@

  
  == Comments ==
  
+ Please add comments related to the project goals and process below.
+ Architectural comments should be posted on same page as the portion of
+ the architecture to which the comment is directed. Thank you.
- Please add comments below.
- 
- === It is not Row-Oriented. ===
- 
- by [wiki:udanax Udanax] [[MailTo(webmaster AT SPAMFREE udanax DOT org)]]
- 
- It's need to be much smaller, much faster, managed for high-demand analytics and can be
sparse.
- So, BigTable(Hbase) must Column storing like C-Store for wide and sparse data.
- In a column oriented, NULLs are much easier to handle, and impose a significantly smaller
performance overhead.
- And supports both Horizontal/Vertical Parallel Processing.
- 
- Do you know RDF(Resource Description Framework) Storage?
- We Can put it.
- 
-  * Storing and managing very large amounts of structured data
-  * Row/column space can be sparse
-  * Columns are in the form of (family: optional qualifier). This is a RDF Properties 
-  * Columns have type information  
-  * Because of the design of the system, columns are easy to create (and are created implicitly)

-  * Column families can be split into locality groups (Ontologies!) 
- 
- And then, assume some job.
- I wanna get clustered document set by one of RDF Properties.
- It can be Readed only vertical(Column) Data from Table, because Column-stored.
- if you are not in agreement on this point, let me show your ideas via attach me through
MSN Messenger(webmaster@udanax.org)
  
  ----
- CategoryTemplate CategoryTemplate CategoryTemplate
  
+ ''insert comments here''
+ 
+ ----
+ 

Mime
View raw message