From hadoopcommitsreturn2093apmaillucenehadoopcommitsarchive=lucene.apache.org@lucene.apache.org Mon Aug 06 06:39:43 2007
ReturnPath:
DeliveredTo: apmaillucenehadoopcommitsarchive@locus.apache.org
Received: (qmail 48242 invoked from network); 6 Aug 2007 06:39:43 0000
Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2)
by minotaur.apache.org with SMTP; 6 Aug 2007 06:39:43 0000
Received: (qmail 58200 invoked by uid 500); 6 Aug 2007 06:39:42 0000
DeliveredTo: apmaillucenehadoopcommitsarchive@lucene.apache.org
Received: (qmail 58170 invoked by uid 500); 6 Aug 2007 06:39:42 0000
MailingList: contact hadoopcommitshelp@lucene.apache.org; run by ezmlm
Precedence: bulk
ListHelp:
ListUnsubscribe:
ListPost:
ListId:
ReplyTo: hadoopdev@lucene.apache.org
DeliveredTo: mailing list hadoopcommits@lucene.apache.org
Received: (qmail 58161 invoked by uid 99); 6 Aug 2007 06:39:42 0000
Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136)
by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Aug 2007 23:39:42 0700
XASFSpamStatus: No, hits=100.0 required=10.0
tests=ALL_TRUSTED
XSpamCheckBy: apache.org
Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130)
by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Aug 2007 06:39:40 +0000
Received: from eos.apache.org (localhost [127.0.0.1])
by eos.apache.org (Postfix) with ESMTP id 08F8459A07
for ; Mon, 6 Aug 2007 06:39:20 +0000 (GMT)
ContentType: text/plain; charset="usascii"
MIMEVersion: 1.0
ContentTransferEncoding: 8bit
From: Apache Wiki
To: hadoopcommits@lucene.apache.org
Date: Mon, 06 Aug 2007 06:39:19 0000
MessageID: <20070806063919.7070.9982@eos.apache.org>
Subject: [Lucenehadoop Wiki] Trivial Update of "Hbase/ShellPlans" by udanax
XVirusChecked: Checked by ClamAV on apache.org
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Lucenehadoop Wiki" for change notification.
The following page has been changed by udanax:
http://wiki.apache.org/lucenehadoop/Hbase/ShellPlans

 [[TableOfContents(4)]]
+ [[TableOfContents(5)]]

 = Hbase Shell Plans =
+ = Hbase Shell Plan Draft =

== People Involved ==

* '''Syntax definition.'''
 * [wiki:udanax Edward Yoon], Master.[[BR]]Open Collaboration, NHN corp.
+ * [:udanax:Edward Yoon], Master.[[BR]]Open Collaboration, NHN corp.
* Inchul Song, Ph.D. Candidate[[BR]]Database Lab[[BR]]Division of Computer Science, KAIST

* '''Code Implementation.'''
 * [wiki:udanax Edward Yoon], Master.[[BR]]Open Collaboration, NHN corp.
+ * [:udanax:Edward Yoon], Master.[[BR]]Open Collaboration, NHN corp.
* Minsu Kim, System Engineer at Daum corp.
* Sewon Kim, System Engineer at Empas corp.
 * '''Jira Issues.'''
 * https://issues.apache.org/jira/browse/HADOOP1608
 * https://issues.apache.org/jira/browse/HADOOP1658
 * https://issues.apache.org/jira/browse/HADOOP1655

If you have constructive ideas, please advise me. webmaster@udanax.org
 == Suggested Hbase Shell Syntax ==
+ == Suggested Hbase Shell plans ==
  Inchul, Feel free to add your opinion.
+ ''Inchul, Feel free to add your opinion.[[BR]]udanax''
+ * [:HbaseShell/HQL]  I've made some changes to your initial HQL to make it look more like SQL. I borrowed the syntax definition style from MySQL.
 HBase Query Language (HQL) discussions and syntax draft page.

 * http://www.hadoop.co.kr/wiki/moin.cgi/HBaseShell/HQL

 = Hbase Shell altools plans =
+ == Suggested Hbase Shell altools plans ==

I suggest to develop HBase Shell in SQLstyle, and develop '''al'''gebraic '''tools''' as a sub shell as described below.
{{{
@@ 51, +40 @@
Hbase.altools > exit;
Hbase > eixt;
}}}
+ Hbase altools is an Hbase Shell sub 'interpreter' (or 'shell)' program to provide scalable data processing capabilities like aggregation, algebraic calculation(groups and sets, commutative rings, algebraic geometry, and linear algebra) on Hadoop + Hbase based parallel machines.
 Hbase altools is an Hbase Shell sub 'interpreter' (or 'shell)' program to provide scalable data processing capabilities like aggregation, algebraic calculation(groups and sets, commutative rings, algebraic geometry, and linear algebra) on Hadoop + Hbase based parallel machines.
+ ''Altools Matrix operations will show how Google search's LSI, Google Earth's algebraic topology, Google News' recommendation system are related to Bigtable. See the HBase Shell Usage Page.[:HBaseShell/Examples]''
 ''Altools Matrix operations will show how Google search's LSI, Google Earth's algebraic topology, Google News' recommendation system are related to Bigtable.''

 = Hbase altools Goals =
+ === Hbase altools Goals ===
* A Simplified Import/Export/Migrate Functionality Between different data sources (Hadoop, HBase)
* A Simplified processing of a logical data model
* A Simplified algebraic operations
* A Simplified Parallel Numerical Analysis by abstracting/numericalizing points, lines, or plane data across multiple maps in HBase.

 == HBase altools Background ==
+ === HBase altools Background ===

 I expect Hadoop + Hbase to handle sparsity and data explosion very well in near future. Moreover, i believe the design of the multidimensional map structure and the 3d space model of the data are optimized for rapid adhoc information retrieval in any orientation, as well as for fast, flexible calculation and transformation of raw data based on formulaic relationships. It is advantageous with respect to '''Analysis Processing'''
+ I expect Hadoop + Hbase to handle sparsity and data explosion very well in near future. Moreover, i believe the design of the multidimensional map structure and the 3d space model of the data are optimized for rapid adhoc information retrieval in any orientation, as well as for fast, flexible calculation and transformation of raw data based on formulaic relationships. It is advantageous with respect to '''Analysis Processing''' as it allows users to easily formulate complex queries, and filter or slice data into meaningful subsets, among other things.
 as it allows users to easily formulate complex queries, and filter or slice data into meaningful subsets, among other things.
=== Rationale ===

 It will probably take a while for Hadoop + HBase to provide reliable realtime service like other DBMS.
+ It will probably take a while for Hadoop + HBase to provide reliable realtime service like other DBMS. [[BR]]Also, Multi Dimensional Model is commonly accepted for OLAP.
 [[BR]]Also, Multi Dimensional Model is commonly accepted for OLAP.

 '''System Characteristic''' '''RDBMS''' '''MultiDimensional Model Hbase''' 
+ '''System Characteristic''' '''RDBMS''' '''MultiDimensional Model Hbase''' 
Data Retrieval Perfomance Slow Fast 
Calculation Functionality  Limited, in all but one dimension Can be very high, all dimensions 
Openness to live data access by other applications Excellent Limited 
Priorities High perfomance, High availability High flexibility, High user autonomy 
+
Thus, I decided to develop a shell to process linear algebraic computing and large scale data using Hadoop's parallel processing and HBase storage.
''Then you may ask "What is a difference from MapReduce using MapFiles?"''
+ I don't expect it to give us a highperformance just yet, but it will sure make data management and development much easier. First, let's take a look at HBase's data model.
 I don't expect it to give us a highperformance just yet,
 but it will sure make data management and development much easier.
 First, let's take a look at HBase's data model.
+ HBase provides a unified data model and it represents a data in 3dimensional  Row, Column, and TImestamp. Also, Row and Column may be extended infinitely.
 HBase provides a unified data model and it represents a data in 3dimensional
  Row, Column, and TImestamp. Also, Row and Column may be extended infinitely.
+ If we decide to cut the data model in time version, then we may view the new data as a 2D table. If index is in string, we may view it as a huge map. If index is in integer, then it is one huge 2D array.
 If we decide to cut the data model in time version, then we may view the new data as a 2D table.
 If index is in string, we may view it as a huge map. If index is in integer, then it is one huge 2D array.
+ So each table may have such data storages in 3D (Columnfamilies) Locality Group(Columnfamilies) is a relationship that can occur between multiple references whenever one reference brings in much of the data used by the other references.
 So each table may have such data storages in 3D (Columnfamilies)
 Locality Group(Columnfamilies) is a relationship that can occur between multiple references
 whenever one reference brings in much of the data used by the other references.

 = Suggested Hbase altools Operators =
+ === Suggested Hbase altools Operators ===
'''Note''' that Data should be located by their row, column, and timestamp.
 == Commands ==
+ ==== Commands ====
 '''Command''' '''Explanation''' 
+ '''Command''' '''Explanation''' 
Table '''Table''' command load from specified table. [[BR]][[BR]]~''A = Table('movieLog_table');''~ 
Matrix '''Matrix''' command control the configuration of the logic matrix. [[BR]][[BR]]~''M = Matrix(table_name, columnfamily_name[, scalar S]);''~ 
Substitute  '''Substitute''' expression to [A~Z][[BR]][[BR]]~''A = Table('movieLog_table');''~ 
Store '''Store''' command will store results to specified table. [[BR]][[BR]]~''A = Table('movieLog_table'); [[BR]]B = A.Selection(length > 100); [[BR]]Store B TO table('tmp_table')[or file('backup.dat')];''~ 
 == Relational Operators ==
+ ==== Relational Operators ====
 '''Operator''' '''Explanation''' 
+ '''Operator''' '''Explanation''' 
Projection <99%>'''Projection''' of a relation ~+R+~, It makes a new relation as the set that is obtained when all tuples(rows) in ~+R+~ are restricted to the set {columnfamily,,1,,,...,columnfamily,,n,,}.[[BR]][[BR]]~''A = Table('movieLog_table');[[BR]]B = A.Projection('year','length');''~ 
Selection <99%>'''Selection''' of a relation ~+R+~, It makes a new relation as the set of specified tuples(rows) of the relation ~+R+~[[BR]]'''Set Operations''' : ~''OR, AND, NOT''~[[BR]][[BR]]~''A = Table('movieLog_table');[[BR]]B = A.Selection(length > 100 AND studioName = 'Fox');''~ 
Group <99%>'''Group''' tuples by value of an attribute and apply aggregate function independently to each group of tuples.[[BR]]'''Aggregate Functions''' : ~''AVG( attribute ), SUM( attribute ), COUNT( attribute ), MIN( attribute ), MAX( attribute )''~[[BR]][[BR]]~''A = Table('movieLog_table);[[BR]]B = A.Group('studioName', MIN('year'));''~ 
Sort <99%>'''Sort''' of tuples(rows) of R, ordered according to columnfamilies on columnfamilylist[[BR]][[BR]]~''A = Table('movieLog_table');[[BR]]B = Sort by ('length');''~ 
+ ==== Matrix Arithmetic Operators ====
+ '''Operator''' '''Explanation''' 
+ Addition <99%>'''Adding''' entries with the same indices [[BR]][[BR]]~''A = Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C = A + B;''~ 
+ Subtraction <99%>'''Subtracting''' entries with the same indices [[BR]][[BR]]~''A = Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C = A + B;''~ 
+ Multiplication <99%>'''Multiplication''' of two matrices, Product C of two matrices A and B [[BR]][[BR]]~''A = Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C = A * B;''~ 
+ Division <99%>'''Division''' is solving the matrix equation AX = B for X [[BR]][[BR]]~''A = Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C = A /[or \] B;''~
+ Transpose <99%>'''Transpose''' of a Matrix, A matrix which is formed by turning all the rows of a given matrix into columns and viceversa.[[BR]][[BR]]~''A = Matrix('m_table','cf_1');[[BR]]B = Transpose(A);''~
+
+ ==== Factorizations and Decompositions ====
+ '''Function''' '''Explanation''' 
+ LU <99%>'''LU Decomposition'''[[BR]]A procedure for decomposing an N by N matrix A into a product of a lower triangular matrix L and an upper triangular matrix U, LU = A[[BR]]'''Functions''' : ~''getL(), getU(), isSingular(), getPivot()''~ [[BR]][[BR]]~''A = Matrix('m_table','cf_1');[[BR]]B = LUDecomposition(A);[[BR]]C = getU(B);[[BR]]D = getL(A);''~
+ QR <99%>'''QR Decomposition'''[[BR]]For an mbyn matrix A with m >= n, the QR decomposition is an mbyn orthogonal matrix Q and an nbyn upper triangular matrix R so that A = Q*R.[[BR]]'''Functions''' : ~''getH(), getQ(), getR()''~[[BR]][[BR]]~''A = Matrix('m_table','cf_1');[[BR]]B = QRDecomposition(A);[[BR]]C = getH(B);''~
+ Cholesky <99%>'''Cholesky Decomposition'''[[BR]]It is a special case of LU decomposition applicable only if matrix to be decomposed is symmetric positive definite.[[BR]]'''Functions''' : ~''getL(), isSPD()''~ [[BR]][[BR]]~''A = Matrix('m_table','cf_1');[[BR]]B = CholeskyDecomposition(A);[[BR]]C = getU(B);[[BR]]D = getL(A);''~
+ SVD <99%>'''SV(Singular Value) Decomposition'''[[BR]]For an mbyn matrix A with m >= n, the singular value decomposition is an mbyn orthogonal matrix U, an nbyn diagonal matrix S, and an nbyn orthogonal matrix V so that A = U*S*V'.[[BR]]'''Functions''' : ~''getS(), getU(), getV(), getSingularValues()''~ [[BR]][[BR]]~''A = Matrix('m_table','cf_1');[[BR]]B = SVDecomposition(A);[[BR]]C = getU(B);''~

+ = Implementation =
 == Matrix Operators ==
 '''Operator''' '''Explanation''' 
 Addition <99%>Adding entries with the same indices [[BR]][[BR]]~''C = A + B;''~ 
 subtraction <99%>Subtracting entries with the same indices [[BR]][[BR]]~''C = A + B;''~ 
 multiplication <99%>Product C of two matrices A and B [[BR]][[BR]]~''C = A * B;''~ 
 division <99%>... 
 transpose <99%>... 
 permutation <99%>... 
 norms <99%>... 
 === Factorizations and decompositions ===
 '''Function''' '''Explanation''' 
 LU <99%>... 
 QR <99%>... 
 Cholesky <99%>... 
 SVD <99%>... 
 Inverse <99%>... 
 Pseudoinverse <99%>... 
 Condition <99%>... 
 Determinant <99%>... 
 Rank <99%>... 
 === ColumnWise Data Analysis ===
 '''Function''' '''Explanation''' 
 Frequencies <99%>... 
 Sorting <99%>... 
 Covariance <99%>... 
+ '''Note'''
+ {{{
+ Run the following: % ant clean jar compilecontrib test javadoc
 = Examples =
+ This will run all tests and will show you javadoc warnings if any(Javadoc warnings will cause hudson to fail).
+ If you just want to run the hbase tests only because the full suitetakes too long, do following:
+ % cd src/contrib/hbase
+ % ant jar test
+ OR
+ % ant clean jar test
 == Relational Operations Examples ==
 Row Key Column Families 
 title  year length inColor  studioName  vote  producer 
 Star Wars year:  1977 length:  124 inColor:  true studioName:  Fox  vote:''user_1''  5  producer:  George Lucas 
           vote:''user_2''  2   
 Mighty Ducks year:  1991 length:  104 inColor:  true studioName:  Disney  vote:''user_1''  2  producer:  Blair Peters 
           vote:''user_3''  4   
 Wayne's World year:  1992 length:  95 inColor:  true studioName:  Paramount  vote:''user_2''  3  producer:  Penelope Spheeris 
           vote:''user_3''  4   
 '''~+^π^+~'''~title~,~year~,~length~'''~+^(movieLog_table)^+~'''
+ St.Ack
+ }}}
 A = table('movieLog_table'); [[BR]]B = A.projection('year','length');
 title year length 
 Star Wars 1977 124 
 Mighty Ducks 1991 104 
 Wayne's World 1992 95 
+ 
+ = Example Of Hbase Shell Use =
 '''~+^σ^+~'''~length>100~'''~+^(movieLog_table)^+~'''
+ See [:HbaseShell/Examples]
 A = Table('movieLog_table'); [[BR]]B = A.Selection(length > 100);
 title year length inColor studioName producer 
 Star Wars 1977 124 true Fox 12345 
 Mighty Ducks 1991 104 true Disney 67890 


 '''~+^π^+~'''~title~,~year~'''~+^(σ^+~'''~length>100~'''~+^(movieLog_table)∩σ^+~'''~studioName='Fox'~'''~+^(movieLog_table))^+~'''

 A = Table('movieLog_table'); [[BR]]B = A.Projection('year'); [[BR]]C = B.Selection(length > 100 AND studioName = 'Fox');
 title year 
 Star Wars 1977 
