hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Trivial Update of "Hbase/ShellPlans" by udanax
Date Tue, 31 Jul 2007 03:47:15 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by udanax:
http://wiki.apache.org/lucene-hadoop/Hbase/ShellPlans

------------------------------------------------------------------------------
  
  == HBase Shell Background ==
  
- I expect Hadoop + Hbase to handle sparsity and data explosion very well in near future.
Moreover, i believe the design of the multi-dimensional map structure and the 3d space model
of the data are optimized for rapid ad-hoc information retrieval in any orientation, as well
as for fast, flexible calculation and transformation of raw data based on formulaic relationships.
+ I expect Hadoop + Hbase to handle sparsity and data explosion very well in near future.
Moreover, i believe the design of the multi-dimensional map structure and the 3d space model
of the data are optimized for rapid ad-hoc information retrieval in any orientation, as well
as for fast, flexible calculation and transformation of raw data based on formulaic relationships.
It is advantageous with respect to Analysis Processing 
+ as it allows users to easily formulate complex queries, and filter or slice data into meaningful
subsets, among other things.
  
  Then, I thought it would require a more user-friendly interface to enable querying the data
interactive.
  
  === Rationale ===
  
- It will probably take a while for Hadoop + HBase to provide reliable real-time service like
other DBMS. Thus, I decided to develop a shell to process linear algebraic computing and large
scale data using Hadoop's parallel processing and HBase storage.
+ It will probably take a while for Hadoop + HBase to provide reliable real-time service like
other DBMS. 
+ [[BR]]Generally, Multi Dimensional Model is commonly accepted for OLAP.
+ 
+ ||<bgcolor="#ececec">'''System Characteristic''' ||<bgcolor="#ececec">'''RDBMS'''
||<bgcolor="#ececec">'''Multi-Dimensional Model Hbase''' ||
+ ||Data Retrieval Perfomance ||Slow ||Fast ||
+ ||Calculation Functionality || Limited, in all but one dimension ||Can be very high, all
dimensions ||
+ ||Openness to live data access by other applications ||Excellent ||Limited ||
+ ||Priorities ||High perfomance, High availability ||High flexibility, High user autonomy
||
+ 
+ Thus, I decided to develop a shell to process linear algebraic computing and large scale
data using Hadoop's parallel processing and HBase storage.
  
  ''Then you may ask "What is a difference from MapReduce using MapFiles?"''
  
@@ -41, +51 @@

  Locality Group(Columnfamilies) is a relationship that can occur between multiple references
  whenever one reference brings in much of the data used by the other references.
  
-   ''-- I hope physical files on networks are grouped together with locality grouping.''
- 
- 
- writing...
- 
- ||<bgcolor="#ececec">'''System Characteristic''' ||<bgcolor="#ececec">'''RDBMS'''
||<bgcolor="#ececec">'''Multi-Dimensional Model Hbase''' ||
- ||Data Retrieval Perfomance ||Slow ||Fast ||
- ||Calculation Functionality || Limited, in all but one dimension ||Can be very high, all
dimensions ||
- ||Openness to live data access by other applications ||Excellent ||Limited ||
- ||Priorities ||High perfomance, High availability ||High flexibility, High user autonomy
||
- 
- 
- 
  ----
  = Suggested Future Hbase Shell Operators =
  '''Note''' that Data should be located by their row, column, and timestamp.
@@ -61, +58 @@

  == Commands ==
  ||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' ||
  ||Substitute || '''Substitute''' expression to [A~Z][[BR]][[BR]]~-''X = Matrix(table_name,
columnfamily_name);''-~||
- ||Store ||'''STORE''' command will store results to specified table. [[BR]][[BR]]~-''A =
Table('movieLog_table'); [[BR]]B = A.Selection('length' > 100); [[BR]]STORE B TO X run_style;''-~
||
+ ||Store ||'''STORE''' command will store results to specified table. [[BR]][[BR]]~-''A =
Table('movieLog_table'); [[BR]]B = A.Selection(length > 100); [[BR]]STORE B TO X run_style;''-~
||
  
  == Relational Operators ==
  
  ||<bgcolor="#ececec">'''Operator''' ||<bgcolor="#ececec">'''Explanation''' ||
  ||Projection ||<99%>'''Projection''' of a relation ~+R+~, It makes a new relation
as the set that is obtained when all tuples(rows) in ~+R+~ are restricted to the set {columnfamily,,1,,,...,columnfamily,,n,,}.[[BR]][[BR]]~-''A
= Table('movieLog_table');[[BR]]B = A.Projection('year','length');''-~||
- ||Selection ||<99%>'''Selection''' of a relation ~+R+~, It makes a new relation as
the set of specified tuples(rows) of the relation ~+R+~[[BR]]'''Set Operations''' : ~-''OR,
AND, NOT''-~[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Selection('length' >
100);[[BR]]C = A.Selection('length' > 100 AND 'year' > 1979);''-~||
+ ||Selection ||<99%>'''Selection''' of a relation ~+R+~, It makes a new relation as
the set of specified tuples(rows) of the relation ~+R+~[[BR]]'''Set Operations''' : ~-''OR,
AND, NOT''-~[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Selection(length >
100);[[BR]]C = A.Selection(length > 100 AND studioName = 'Fox');''-~||
- ||Product ||<99%>'''Product''' of relations R and S, It makes a new relation as the
set of all possible combinations of tuples of the two operation relations.[[BR]]'''NOTE'''
that this is the most computationally expensive operator in the relational algebra.||
- ||Rename ||<99%>'''Rename''' r to x, The columnfamily names in the columnfamily-list
replace the columnfamily names of the relation.[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B
= A.Rename('length' = 'movieLength');''-~||
  ||Group ||<99%>'''Group''' tuples by value of an attribute and apply aggregate function
independently to each group of tuples.[[BR]]'''Aggregate Functions''' : ~-''AVG( attribute
), SUM( attribute ), COUNT( attribute ), MIN( attribute ), MAX( attribute )''-~[[BR]][[BR]]~-''A
= Table('movieLog_table);[[BR]]B = A.Group('studioName', MIN('year'));''-~||
- ||Sort ||<99%>'''Sort''' of tuples(rows) of R, ordered according to columnfamilies
on columnfamily-list[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Sort('length',
'vote');''-~||
+ ||Sort ||<99%>'''Sort''' of tuples(rows) of R, ordered according to columnfamilies
on columnfamily-list[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = Sort by ('length');''-~||
  
+ === Examples ===
  
  '''~+^π^+~'''~-title-~,~-year-~,~-length-~'''~+^(movieLog_table)^+~'''
  

Mime
View raw message