hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Trivial Update of "Hbase/HbaseArchitecture" by udanax
Date Tue, 06 Feb 2007 09:21:16 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by udanax:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture

------------------------------------------------------------------------------
  
  by [wiki:udanax Udanax] [[MailTo(webmaster AT SPAMFREE udanax DOT org)]]
  
- It's need to be much smaller, much faster, managed for high-demand analytics and can be
sparse.
- So, BigTable(Hbase) must Column storing like C-Store for wide and sparse data.
- In a column oriented, NULLs are much easier to handle, and impose a significantly smaller
performance overhead.
+ I think Hbase should be compact (space-efficient), fast and should be able to manage high-demand
load. It should be able to handle sparse tables efficiently.
+ So, for wide and sparse data, Hbase must store data by columns like C-Store does.
+ A column-oriented system handles NULLs more easily with significantly smaller performance
overhead,
- And supports both Horizontal/Vertical Parallel Processing.
+ and supports both Horizontal and Vertical Parallel Processing.
  
- Do you know RDF(Resource Description Framework) Storage?
- We Can put it.
+ Let's consider the following case:
+ You may be familiar to RDF(Resource Description Framework) Storage from W3C, which is
  
   * Storing and managing very large amounts of structured data
   * Row/column space can be sparse
@@ -286, +286 @@

   * Because of the design of the system, columns are easy to create (and are created implicitly)

   * Column families can be split into locality groups (Ontologies!) 
  
- And then, assume some job.
- I wanna get clustered document set by one of RDF Properties.
- It can be Readed only vertical(Column) Data from Table, because Column-stored.
- if you are not in agreement on this point, let me show your ideas via attach me through
MSN Messenger(webmaster@udanax.org)
+ Let's assume a large amount of RDF documents are stored in the system.
+ And then, vertical(column) data set by one of RDF properties can be read fast from Table,
because it is column-stored.
+ Please let me know if you don't agree with me.
+ 
  
  ----
  

Mime
View raw message