hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Trivial Update of "Hbase/ShellPlans" by udanax
Date Mon, 06 Aug 2007 06:39:19 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by udanax:
http://wiki.apache.org/lucene-hadoop/Hbase/ShellPlans

------------------------------------------------------------------------------
- [[TableOfContents(4)]]
+ [[TableOfContents(5)]]
  
  ----
  
- = Hbase Shell Plans =
+ = Hbase Shell Plan Draft =
- 
  == People Involved ==
- 
   * '''Syntax definition.'''
-   * [wiki:udanax Edward Yoon], Master.[[BR]]Open Collaboration, NHN corp.
+   * [:udanax:Edward Yoon], Master.[[BR]]Open Collaboration, NHN corp.
    * Inchul Song, Ph.D. Candidate[[BR]]Database Lab[[BR]]Division of Computer Science, KAIST
- 
   * '''Code Implementation.'''
-   * [wiki:udanax Edward Yoon], Master.[[BR]]Open Collaboration, NHN corp.
+   * [:udanax:Edward Yoon], Master.[[BR]]Open Collaboration, NHN corp.
    * Minsu Kim, System Engineer at Daum corp.
    * Sewon Kim, System Engineer at Empas corp.
  
-  * '''Jira Issues.'''
-   * https://issues.apache.org/jira/browse/HADOOP-1608
-    * https://issues.apache.org/jira/browse/HADOOP-1658
-   * https://issues.apache.org/jira/browse/HADOOP-1655
- 
  If you have constructive ideas, please advise me. webmaster@udanax.org
  
- == Suggested Hbase Shell Syntax ==
+ == Suggested Hbase Shell plans ==
  
-   -- Inchul, Feel free to add your opinion.
+  ''--Inchul, Feel free to add your opinion.[[BR]]udanax''
  
+  * [:HbaseShell/HQL] - I've made some changes to your initial HQL to make it look more like
SQL. I borrowed the syntax definition style from MySQL.
- HBase Query Language (HQL) discussions and syntax draft page.
- 
-  * http://www.hadoop.co.kr/wiki/moin.cgi/HBaseShell/HQL
  
  ----
  
- = Hbase Shell altools plans =
+ == Suggested Hbase Shell altools plans ==
- 
  I suggest to develop HBase Shell in SQL-style, and develop '''al'''gebraic '''tools''' as
a sub shell as described below. 
  
  {{{
@@ -51, +40 @@

  Hbase.altools > exit;
  Hbase > eixt;
  }}}
+ Hbase altools is an Hbase Shell sub 'interpreter' (or 'shell)' program to provide scalable
data processing capabilities like  aggregation, algebraic calculation(groups and sets, commutative
rings, algebraic geometry, and linear algebra) on Hadoop + Hbase based parallel machines.

  
- Hbase altools is an Hbase Shell sub 'interpreter' (or 'shell)' program to provide scalable
data processing capabilities like aggregation, algebraic calculation(groups and sets, commutative
rings, algebraic geometry, and linear algebra) on Hadoop + Hbase based parallel machines.
+  ''--Altools Matrix operations will show how Google search's LSI, Google Earth's algebraic
topology, Google News' recommendation system are related to Bigtable. See the HBase Shell
Usage Page.[:HBaseShell/Examples]''
  
- ''Altools Matrix operations will show how Google search's LSI, Google Earth's algebraic
topology, Google News' recommendation system are related to Bigtable.''
- 
- = Hbase altools Goals =
+ === Hbase altools Goals ===
   * A Simplified Import/Export/Migrate Functionality Between different data sources (Hadoop,
HBase)
   * A Simplified processing of a logical data model
   * A Simplified algebraic operations
   * A Simplified Parallel Numerical Analysis by abstracting/numericalizing points, lines,
or plane data across multiple maps in HBase.
- 
- == HBase altools Background ==
+ === HBase altools Background ===
- 
- I expect Hadoop + Hbase to handle sparsity and data explosion very well in near future.
Moreover, i believe the design of the multi-dimensional map structure and the 3d space model
of the data are optimized for rapid ad-hoc information retrieval in any orientation, as well
as for fast, flexible calculation and transformation of raw data based on formulaic relationships.
It is advantageous with respect to '''Analysis Processing''' 
+ I expect Hadoop + Hbase to handle sparsity and data explosion very well in near future.
Moreover, i believe the design of the multi-dimensional map structure and the 3d space model
of the data are optimized for rapid ad-hoc information retrieval in any orientation, as well
as for fast, flexible calculation and transformation of raw data based on formulaic relationships.
It is advantageous with respect to '''Analysis Processing'''  as it allows users to easily
formulate complex queries, and filter or slice data into meaningful subsets, among other things.
- as it allows users to easily formulate complex queries, and filter or slice data into meaningful
subsets, among other things.
  
  === Rationale ===
- 
- It will probably take a while for Hadoop + HBase to provide reliable real-time service like
other DBMS. 
+ It will probably take a while for Hadoop + HBase to provide reliable real-time service like
other DBMS.  [[BR]]Also, Multi Dimensional Model is commonly accepted for OLAP.
- [[BR]]Also, Multi Dimensional Model is commonly accepted for OLAP.
- 
- ||<bgcolor="#ececec">'''System Characteristic''' ||<bgcolor="#ececec">'''RDBMS'''
||<bgcolor="#ececec">'''Multi-Dimensional Model Hbase''' ||
+ ||<bgcolor="#E5E5E5">'''System Characteristic''' ||<bgcolor="#E5E5E5">'''RDBMS'''
||<bgcolor="#E5E5E5">'''Multi-Dimensional Model Hbase''' ||
  ||Data Retrieval Perfomance ||Slow ||Fast ||
  ||Calculation Functionality || Limited, in all but one dimension ||Can be very high, all
dimensions ||
  ||Openness to live data access by other applications ||Excellent ||Limited ||
  ||Priorities ||High perfomance, High availability ||High flexibility, High user autonomy
||
  
+ 
  Thus, I decided to develop a shell to process linear algebraic computing and large scale
data using Hadoop's parallel processing and HBase storage.
  
  ''Then you may ask "What is a difference from MapReduce using MapFiles?"''
  
+ I don't expect it to give us a high-performance just yet, but it will sure make data management
and development much easier. First, let's take a look at HBase's data model.
- I don't expect it to give us a high-performance just yet,
- but it will sure make data management and development much easier.
- First, let's take a look at HBase's data model.
  
+ HBase provides a unified data model and it represents a data in 3-dimensional - Row, Column,
and TImestamp. Also, Row and Column may be extended infinitely.
- HBase provides a unified data model and it represents a data in 3-dimensional
- - Row, Column, and TImestamp. Also, Row and Column may be extended infinitely.
  
+ If we decide to cut the data model in time version, then we may view the new data as a 2D
table. If index is in string, we may view it as a huge map. If index is in integer, then it
is one huge 2D array.
- If we decide to cut the data model in time version, then we may view the new data as a 2D
table.
- If index is in string, we may view it as a huge map. If index is in integer, then it is
one huge 2D array.
  
+ So each table may have such data storages in 3D (Columnfamilies) Locality Group(Columnfamilies)
is a relationship that can occur between multiple references whenever one reference brings
in much of the data used by the other references.
- So each table may have such data storages in 3D (Columnfamilies)
- Locality Group(Columnfamilies) is a relationship that can occur between multiple references
- whenever one reference brings in much of the data used by the other references.
  
  ----
  
- = Suggested Hbase altools Operators =
+ === Suggested Hbase altools Operators ===
  '''Note''' that Data should be located by their row, column, and timestamp.
  
- == Commands ==
+ ==== Commands ====
- ||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' ||
+ ||<bgcolor="#E5E5E5">'''Command''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
  ||Table ||'''Table''' command load from specified table. [[BR]][[BR]]~-''A = Table('movieLog_table');''-~
||
  ||Matrix ||'''Matrix''' command control the configuration of the logic matrix. [[BR]][[BR]]~-''M
= Matrix(table_name, columnfamily_name[, scalar S]);''-~ ||
  ||Substitute || '''Substitute''' expression to [A~Z][[BR]][[BR]]~-''A = Table('movieLog_table');''-~
||
  ||Store ||'''Store''' command will store results to specified table. [[BR]][[BR]]~-''A =
Table('movieLog_table'); [[BR]]B = A.Selection(length > 100); [[BR]]Store B TO table('tmp_table')[or
file('backup.dat')];''-~ ||
- == Relational Operators ==
+ ==== Relational Operators ====
- ||<bgcolor="#ececec">'''Operator''' ||<bgcolor="#ececec">'''Explanation''' ||
+ ||<bgcolor="#E5E5E5">'''Operator''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
  ||Projection ||<99%>'''Projection''' of a relation ~+R+~, It makes a new relation
as the set that is obtained when all tuples(rows) in ~+R+~ are restricted to the set {columnfamily,,1,,,...,columnfamily,,n,,}.[[BR]][[BR]]~-''A
= Table('movieLog_table');[[BR]]B = A.Projection('year','length');''-~ ||
  ||Selection ||<99%>'''Selection''' of a relation ~+R+~, It makes a new relation as
the set of specified tuples(rows) of the relation ~+R+~[[BR]]'''Set Operations''' : ~-''OR,
AND, NOT''-~[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Selection(length >
100 AND studioName = 'Fox');''-~ ||
  ||Group ||<99%>'''Group''' tuples by value of an attribute and apply aggregate function
independently to each group of tuples.[[BR]]'''Aggregate Functions''' : ~-''AVG( attribute
), SUM( attribute ), COUNT( attribute ), MIN( attribute ), MAX( attribute )''-~[[BR]][[BR]]~-''A
= Table('movieLog_table);[[BR]]B = A.Group('studioName', MIN('year'));''-~ ||
  ||Sort ||<99%>'''Sort''' of tuples(rows) of R, ordered according to columnfamilies
on columnfamily-list[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = Sort by ('length');''-~
||
  
+ ==== Matrix Arithmetic Operators ====
+ ||<bgcolor="#E5E5E5">'''Operator''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
+ ||Addition ||<99%>'''Adding''' entries with the same indices [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B
= Matrix('m_table','cf_2');[[BR]]C = A + B;''-~ ||
+ ||Subtraction ||<99%>'''Subtracting''' entries with the same indices [[BR]][[BR]]~-''A
= Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C = A + B;''-~ ||
+ ||Multiplication ||<99%>'''Multiplication''' of two matrices, Product C of two matrices
A and B [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C
= A * B;''-~ ||
+ ||Division ||<99%>'''Division''' is solving the matrix equation AX = B for X [[BR]][[BR]]~-''A
= Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C = A /[or \] B;''-~||
+ ||Transpose ||<99%>'''Transpose''' of a Matrix, A matrix which is formed by turning
all the rows of a given matrix into columns and vice-versa.[[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B
= Transpose(A);''-~||
+ 
+ ==== Factorizations and Decompositions ====
+ ||<bgcolor="#E5E5E5">'''Function''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
+ ||LU ||<99%>'''LU Decomposition'''[[BR]]A procedure for decomposing an N by N matrix
A into a product of a lower triangular matrix L and an upper triangular matrix U, LU = A[[BR]]'''Functions'''
: ~-''getL(), getU(), isSingular(), getPivot()''-~ [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B
= LUDecomposition(A);[[BR]]C = getU(B);[[BR]]D = getL(A);''-~||
+ ||QR ||<99%>'''QR Decomposition'''[[BR]]For an m-by-n matrix A with m >= n, the
QR decomposition is an m-by-n orthogonal matrix Q and an n-by-n upper triangular matrix R
so that A = Q*R.[[BR]]'''Functions''' : ~-''getH(), getQ(), getR()''-~[[BR]][[BR]]~-''A =
Matrix('m_table','cf_1');[[BR]]B = QRDecomposition(A);[[BR]]C = getH(B);''-~||
+ ||Cholesky ||<99%>'''Cholesky Decomposition'''[[BR]]It is a special case of LU decomposition
applicable only if matrix to be decomposed is symmetric positive definite.[[BR]]'''Functions'''
: ~-''getL(), isSPD()''-~ [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = CholeskyDecomposition(A);[[BR]]C
= getU(B);[[BR]]D = getL(A);''-~||
+ ||SVD ||<99%>'''SV(Singular Value) Decomposition'''[[BR]]For an m-by-n matrix A with
m >= n, the singular value decomposition is an m-by-n orthogonal matrix U, an n-by-n diagonal
matrix S, and an n-by-n orthogonal matrix V so that A = U*S*V'.[[BR]]'''Functions''' : ~-''getS(),
getU(), getV(), getSingularValues()''-~ [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B
= SVDecomposition(A);[[BR]]C = getU(B);''-~||
  ----
+ = Implementation =
- == Matrix Operators ==
- ||<bgcolor="#ececec">'''Operator''' ||<bgcolor="#ececec">'''Explanation''' ||
- ||Addition ||<99%>Adding entries with the same indices [[BR]][[BR]]~-''C = A + B;''-~
||
- ||subtraction ||<99%>Subtracting entries with the same indices [[BR]][[BR]]~-''C =
A + B;''-~ ||
- ||multiplication ||<99%>Product C of two matrices A and B [[BR]][[BR]]~-''C = A *
B;''-~ ||
- ||division ||<99%>... ||
- ||transpose ||<99%>... ||
- ||permutation ||<99%>... ||
- ||norms ||<99%>... ||
- === Factorizations and decompositions ===
- ||<bgcolor="#ececec">'''Function''' ||<bgcolor="#ececec">'''Explanation''' ||
- ||LU ||<99%>... ||
- ||QR ||<99%>... ||
- ||Cholesky ||<99%>... ||
- ||SVD ||<99%>... ||
- ||Inverse ||<99%>... ||
- ||Pseudoinverse ||<99%>... ||
- ||Condition ||<99%>... ||
- ||Determinant ||<99%>... ||
- ||Rank ||<99%>... ||
- === Column-Wise Data Analysis ===
- ||<bgcolor="#ececec">'''Function''' ||<bgcolor="#ececec">'''Explanation''' ||
- ||Frequencies ||<99%>... ||
- ||Sorting ||<99%>... ||
- ||Covariance ||<99%>... ||
  
+ '''Note'''
+ {{{
+ Run the following: % ant clean jar compile-contrib test javadoc 
  
- = Examples =
+ This will run all tests and will show you javadoc warnings if any(Javadoc warnings will
cause hudson to fail). 
+ If you just want to run the hbase tests only because the full suitetakes too long, do following:

  
+ % cd src/contrib/hbase
+ % ant jar test 
+ OR 
+ % ant clean jar test 
- == Relational Operations Examples ==
- ||Row Key ||||||||||||||||||||||||Column Families ||
- ||<rowbgcolor="#ececec">title |||| year ||||length ||||inColor |||| studioName ||||
vote |||| producer ||
- ||Star Wars ||year: || 1977 ||length: || 124 ||inColor: || true ||studioName: || Fox ||
vote:''user_1'' || 5 || producer: || George Lucas ||
- || || || || || || || || || || vote:''user_2'' || 2 || || ||
- ||Mighty Ducks ||year: || 1991 ||length: || 104 ||inColor: || true ||studioName: || Disney
|| vote:''user_1'' || 2 || producer: || Blair Peters ||
- || || || || || || || || || || vote:''user_3'' || 4 || || ||
- ||Wayne's World ||year: || 1992 ||length: || 95 ||inColor: || true ||studioName: || Paramount
|| vote:''user_2'' || 3 || producer: || Penelope Spheeris ||
- || || || || || || || || || || vote:''user_3'' || 4 || || ||
- '''~+^π^+~'''~-title-~,~-year-~,~-length-~'''~+^(movieLog_table)^+~'''
  
+ St.Ack
+ }}}
- A = table('movieLog_table'); [[BR]]B = A.projection('year','length');
- ||<rowbgcolor="#ececec">title ||year ||length ||
- ||Star Wars ||1977 ||124 ||
- ||Mighty Ducks ||1991 ||104 ||
- ||Wayne's World ||1992 ||95 ||
  
+ ----
+ = Example Of Hbase Shell Use =
  
- '''~+^σ^+~'''~-length>100-~'''~+^(movieLog_table)^+~'''
+ See [:HbaseShell/Examples]
  
- A = Table('movieLog_table'); [[BR]]B = A.Selection(length > 100);
- ||<rowbgcolor="#ececec">title ||year ||length ||inColor ||studioName ||producer ||
- ||Star Wars ||1977 ||124 ||true ||Fox ||12345 ||
- ||Mighty Ducks ||1991 ||104 ||true ||Disney ||67890 ||
- 
- 
- '''~+^π^+~'''~-title-~,~-year-~'''~+^(σ^+~'''~-length>100-~'''~+^(movieLog_table)∩σ^+~'''~-studioName='Fox'-~'''~+^(movieLog_table))^+~'''
- 
- A = Table('movieLog_table'); [[BR]]B = A.Projection('year'); [[BR]]C = B.Selection(length
> 100 AND studioName = 'Fox');
- ||<rowbgcolor="#ececec">title ||year ||
- ||Star Wars ||1977 ||
- 

Mime
View raw message