hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Trivial Update of "Hbase/HbaseArchitecture" by udanax
Date Mon, 26 Mar 2007 07:06:05 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by udanax:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture

------------------------------------------------------------------------------
  And then, vertical(column) data set by one of RDF properties can be read fast from Table,
because it is column-stored.
  Please let me know if you don't agree with me.
  
- ----
- === My think. ===
- 
- by [wiki:udanax Udanax] [[MailTo(webmaster AT SPAMFREE udanax DOT org)]]
- 
- First I would like to pay my respect to your committment to the Hbase Project and this is
my opinion.
- [[BR]]Based on the Paper, the picture belows expresses the concept of BigTable when 'T'
is the table and Column Families 'A' and its Attribute-values are like the followings.
- 
- [http://mirror.udanax.org/~udanax/rsync1/blog_udanax_org/udanax/280/o_full.jpg]
- 
- BigTable is the storage layer of the sparse matrix data.
- [[BR]]And the goal is not Data Selection even though it is very useful feature, but Matrix
Computation and Aggregation.
- 
- Refering to the example code of the Google's Paper, it would be like this
- {{{
- Scanner scanner(T);
- ScanStream *stream;
- stream = scanner.FetchColumnFamily("A");
- stream->SetReturnVersions("t2");
- scanner.Lookup("2");
-  
- for (; !stream->Done(); stream->Next()) {
-         printf("%s %s %lld %s\n",
-                 scanner.RowName(),
-                 stream->ColumnName(),
-                 stream->Value());
- }
- }}}
- 
- This example code prints first and second row vectors of the the 4*4 Sparse Matrix.  
- [[BR]]It process vector calculation in parallel with row-wise partition.
- [[BR]]Therefore, in order to do distiributed computing effectively, the data structure needs
to be defined to fully support the preprocessing to get abstract Matrix Information
- 
- Then, I think architecture need to be like this
- 
-  * Data Storage Conceptual 
-  * Data Distribution 
-  * Segment Format 
-  * Data Management Tools 
-  * Parallel Matrix Computation, Parallel Aggregation Engine 
-  * Parallel Analysis Interface 
-  * Example 
-  * Benefits, Benchmark Report / Discussion 
- 
- and theses are the major component list I think architecture need to have
- [[BR]]So, I would like to discuss the arhcitecture of Hbase with you in detail.
- 

Mime
View raw message