hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Trivial Update of "Hbase/ShellPlans" by udanax
Date Mon, 20 Aug 2007 01:40:27 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by udanax:
http://wiki.apache.org/lucene-hadoop/Hbase/ShellPlans

------------------------------------------------------------------------------
  [[TableOfContents(5)]]
  ----
  
+ = Hbase Shell Altools Plan =
- = Hbase Shell Plan Draft =
- Plan is to significantly expand the set of shell operators.  Basic data manipulation and
data definition operators will be extended and evolved to be more SQL-like ([wiki:Hbase/HbaseShell/HQL
HQL]).  More sophisticated manipulations to do relational and linear algebra, matrix additions,
multiplications, etc., will be added to a HBase subshell to keep the two operator types --
SQL-like vs. non-SQL -- distinct.
  
-  ''-- After POC(proof of concept) review, many things can change.[[BR]]-- If you have constructive
ideas, Please advise me. [[MailTo(webmaster AT SPAMFREE udanax DOT org)]]''
+  ''-- After POC(proof of concept) review, many things can change.[[BR]]-- If you have constructive
ideas, Please advise me. [[MailTo(webmaster AT SPAMFREE udanax DOT org)]][[BR]]-- This project
is currently in the planning stage.  [https://issues.apache.org/jira/browse/HADOOP-1608 HADOOP-1608]
to add "Relational Algrebra Operators" is currently in process.''
  
- This project is currently in the planning stage.  [https://issues.apache.org/jira/browse/HADOOP-1608
HADOOP-1608] to add "Relational Algrebra Operators" is currently in process.
+ Hbase altools is an Hbase Shell sub 'interpreter' (or 'shell)' program to provide scalable
data processing capabilities like  aggregation, algebraic calculation(groups and sets, commutative
rings, algebraic geometry, and linear algebra) on Hadoop + Hbase based parallel machines.
especially, it will focus on storing and manipulating numeric, sparse matrices on Hbase.
  
- ----
+  ''-- Altools Matrix operations will show how Google search's LSI, Google Earth's algebraic
topology, Google News' recommendation system are related to Bigtable.''
  
- == Suggested Hbase Shell altools plans ==
  I suggest to develop HBase Shell in SQL-style, and develop '''al'''gebraic '''tools''' as
a sub shell in Intuitionalized-style as described below. 
  
  {{{
@@ -29, +27 @@

  Hbase > exit;
  }}}
  
- Hbase altools is an Hbase Shell sub 'interpreter' (or 'shell)' program to provide scalable
data processing capabilities like  aggregation, algebraic calculation(groups and sets, commutative
rings, algebraic geometry, and linear algebra) on Hadoop + Hbase based parallel machines.
especially, it will focus on storing and manipulating very large sparse matrices on Hbase.
- 
-  ''-- Altools Matrix operations will show how Google search's LSI, Google Earth's algebraic
topology, Google News' recommendation system are related to Bigtable.''
- 
- === Background ===
+ == Background ==
  I expect Hadoop + Hbase to handle sparsity and data explosion very well in near future.
Moreover, i believe the design of the multi-dimensional map structure and the 3d space model
of the data are optimized for rapid ad-hoc information retrieval in any orientation, as well
as for fast, flexible calculation and transformation of raw data based on formulaic relationships.
It is advantageous with respect to Analysis Processing as it allows users to easily formulate
complex queries, and filter or slice data into meaningful subsets, among other things.
  
  ----
  
- == Suggested Hbase altools Syntax ==
+ = Suggested Hbase altools Syntax =
  '''Note''' that Data should be located by their row, column, and timestamp.
  
- === Commands ===
+ == Commands ==
  ||<bgcolor="#E5E5E5">'''Command''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
  ||Table ||<99%>'''Table''' command loads specified table. [[BR]][[BR]]~-''Table('movieLog_table');''-~
||
  ||Matrix ||<99%>'''Matrix''' command constructs the configuration of the logic matrix.[[BR]]'''Options'''
: features not yet. [[BR]][[BR]]~-''Matrix(table_name, columnfamily_name[, option]);''-~ ||
@@ -49, +43 @@

  ||IF...ELSE ||<99%>'''IF...ELSE''', Imposes conditions on the execution. [[BR]][[BR]]~-''IF
( boolean_expression )[[BR]]B = command_statements;[[BR]]ELSE[[BR]]B = command_statements;''-~||
  ||Store ||<99%>'''Store''' command will store results to specified table. [[BR]][[BR]]~-''A
= Table('movieLog_table'); [[BR]]B = A.Selection(length > 100); [[BR]]Store B TO table('tmp_table')[or
file('backup.dat')];''-~ ||
  
- === Relational Operators ===
+ == Relational Operators ==
  ||<bgcolor="#E5E5E5">'''Operator''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
  ||Projection ||<99%>'''Projection''' of a relation ~+R+~, It makes a new relation
as the set that is obtained when all tuples(rows) in ~+R+~ are restricted to the set {columnfamily,,1,,,...,columnfamily,,n,,}.[[BR]][[BR]]~-''A
= Table('movieLog_table');[[BR]]B = A.Projection('year','length'); '''//π,,year.length,,(A)'''
''-~ ||
  ||Selection ||<99%>'''Selection''' of a relation ~+R+~, It makes a new relation as
the set of specified tuples(rows) of the relation ~+R+~.[[BR]]'''Set Operations''' : ~-''OR,
AND, NOT''-~[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Selection(length >
100 AND studioName = 'Fox'); '''//σ,,length > 100.studioName='Fox',,(A)''' ''-~ ||
@@ -79, +73 @@

  Hbase.altools > store C to table('result_table'); 
  }}}
  
- === Matrix Arithmetic Operators ===
+ == Matrix Arithmetic Operators ==
  ||<bgcolor="#E5E5E5">'''Operator''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
  ||Addition ||<99%>'''Adding''' entries with the same indices. [[BR]][[BR]]~-''A =
Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C = A + B; '''// c,,ij,,
= a,,ij,, + b,,ij,, (i : row key, j : column key)''' ''-~ ||
  ||Subtraction ||<99%>'''Subtracting''' entries with the same indices.[[BR]][[BR]]~-''A
= Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C = A - B; '''// c,,ij,,
= a,,ij,, - b,,ij,, (i : row key, j : column key)''' ''-~ ||
@@ -96, +90 @@

  Hbase.altools > C = A * B;  
  }}}
  
- === Factorizations and Decompositions ===
+ == Factorizations and Decompositions ==
  
  ||<bgcolor="#E5E5E5">'''Function''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
  ||LU ||<99%>'''LU Decomposition'''[[BR]]A procedure for decomposing an N by N matrix
A into a product of a lower triangular matrix L and an upper triangular matrix U, LU = A.[[BR]]'''Functions'''
: ~-''getL(), getU(), isSingular(), getPivot()''-~ [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B
= LUDecomposition(A);[[BR]]C = getU(B);[[BR]]D = getL(A);''-~||
  ||QR ||<99%>'''QR Decomposition'''[[BR]]For an m-by-n matrix A with m >= n, the
QR decomposition is an m-by-n orthogonal matrix Q and an n-by-n upper triangular matrix R
so that A = Q*R.[[BR]]'''Functions''' : ~-''getH(), getQ(), getR()''-~[[BR]][[BR]]~-''A =
Matrix('m_table','cf_1');[[BR]]B = QRDecomposition(A);[[BR]]C = getH(B);''-~||
- ||Cholesky ||<99%>'''Cholesky Decomposition'''[[BR]]It is a special case of LU decomposition
applicable only if matrix to be decomposed is symmetric positive definite.[[BR]]'''Functions'''
: ~-''getL(), isSPD()''-~ [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = CholeskyDecomposition(A);[[BR]]C
= getL(A);''-~||
+ ||Cholesky ||<99%>'''Cholesky Decomposition'''[[BR]]It is a special case of LU decomposition
applicable only if matrix to be decomposed is symmetric positive definite.[[BR]]'''Functions'''
: ~-''getL(), getU(), isSPD()''-~ [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = CholeskyDecomposition(A);[[BR]]C
= getL(A);''-~||
  ||SVD ||<99%>'''SV(Singular Value) Decomposition'''[[BR]]For an m-by-n matrix A with
m >= n, the singular value decomposition is an m-by-n orthogonal matrix U, an n-by-n diagonal
matrix S, and an n-by-n orthogonal matrix V so that A = U*S*V'.[[BR]]'''Functions''' : ~-''getS(),
getU(), getV(), getSingularValues()''-~ [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B
= SVDecomposition(A);[[BR]]C = getU(B);''-~||
  
  '''(ex. 1)''' To find the Singular Value decomposition in Altools, do the following:
@@ -113, +107 @@

  Hbase.altools > V = M.getV();
  }}}
  
+ = Papers =
+  * ''Bigtable: A Distributed Storage System for Structured Data''
+  * ''Interpreting the Data: Parallel Analysis with Sawzall''
+  * ''PyTables - Hierarchical Datasets in Python''
+  * ''Numpy - Scientific Tools for Python''
+  * ''C-Store: A Column Oriented DBMS''
+  * ''Matlab, Octave Papers''
+ 

Mime
View raw message