hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Trivial Update of "Hbase/HbaseShell" by udanax
Date Sun, 19 Aug 2007 00:20:59 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by udanax:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseShell

------------------------------------------------------------------------------
- [[TableOfContents(4)]]
+ [[TableOfContents(5)]]
+ ----
+ 
+ = Hbase Shell Plan Draft =
+ Plan is to significantly expand the set of shell operators.  Basic data manipulation and
data definition operators will be extended and evolved to be more SQL-like ([wiki:Hbase/HbaseShell/HQL
HQL]).  More sophisticated manipulations to do relational and linear algebra, matrix additions,
multiplications, etc., will be added to a HBase subshell to keep the two operator types --
SQL-like vs. non-SQL -- distinct.
+ 
+  ''-- After POC(proof of concept) review, many things can change.[[BR]]-- If you have constructive
ideas, Please advise me. [[MailTo(webmaster AT SPAMFREE udanax DOT org)]]''
+ 
+ This project is currently in the planning stage.  [https://issues.apache.org/jira/browse/HADOOP-1608
HADOOP-1608] to add "Relational Algrebra Operators" is currently in process.
  
  ----
- = Hbase Shell Introduction =
- Hbase Shell is a basic, command-line, and interactive 'shell' for manipulating tables in
Hbase. It has support for a small set of SQL-inspired operations. Results are presented in
an ASCII-table format.
  
- The Hbase Shell aims to be to Hbase what the mysql client command-line tool is to mysqld,
and what sqlplus to Oracle.
+ == Suggested Hbase Shell altools plans ==
+ I suggest to develop HBase Shell in SQL-style, and develop '''al'''gebraic '''tools''' as
a sub shell in Intuitionalized-style as described below. 
  
- Hbase Shell was first added to TRUNK in July, 2007.
+ {{{
+ HBase > altools;
  
-  * [http://issues.apache.org/jira/browse/hadoop-1720 HADOOP-1720] to update "[wiki:Hbase/HbaseShell/HQL
HQL]" is currently in process. 
-  * See [wiki:Hbase/ShellPlans Hbase Shell plans] page for discussion and description of
future operators. The intent is to add more support for non-interactive usage as well as operators
for algebraic, relational, and matrix manipulations. 
+ Hbase altools, 0.0.1 version
+ Type 'help;' for Hbase altools usage.
  
- == People Involved ==
-  * [:udanax:Edward Yoon] [[MailTo(webmaster AT SPAMFREE udanax DOT org)]] (Research and
Development center, NHN corp.) -- Initial contributor
-  * [:InchulSong:Inchul Song] [[MailTo(icsong AT SPAMFREE gmail DOT com)]] (Database Lab,
KAIST)
+ Hbase.altools > who are you;
+ 
+  Hadoop + Hbase based algebraic manipulation tools
+ 
+ Hbase.altools > exit;
+ Hbase > exit;
+ }}}
+ 
+ Hbase altools is an Hbase Shell sub 'interpreter' (or 'shell)' program to provide scalable
data processing capabilities like  aggregation, algebraic calculation(groups and sets, commutative
rings, algebraic geometry, and linear algebra) on Hadoop + Hbase based parallel machines.
especially, it will focus on storing and manipulating very large sparse matrices on Hbase.
+ 
+  ''-- Altools Matrix operations will show how Google search's LSI, Google Earth's algebraic
topology, Google News' recommendation system are related to Bigtable.''
+ 
+ === Background ===
+ I expect Hadoop + Hbase to handle sparsity and data explosion very well in near future.
Moreover, i believe the design of the multi-dimensional map structure and the 3d space model
of the data are optimized for rapid ad-hoc information retrieval in any orientation, as well
as for fast, flexible calculation and transformation of raw data based on formulaic relationships.
It is advantageous with respect to Analysis Processing as it allows users to easily formulate
complex queries, and filter or slice data into meaningful subsets, among other things.
  
  ----
- = How to Start a Shell =
- Run the following on the command-line:
  
+ == Suggested Hbase altools Syntax ==
- {{{${HBASE_HOME}/bin/hbase shell}}}
- 
- You will be presented with the following prompt:
- 
- {{{HBase Shell, 0.0.1 version.
- Copyright (c) 2007 by udanax, licensed to Apache Software Foundation.
- Type 'help;' for usage.
- 
- HBase >}}}
- 
- All commands are terminated with a semi-colon: e.g. Type 'help;' to see list of available
commands.
- 
- = Hbase Shell Commands =
  '''Note''' that Data should be located by their row, column, and timestamp.
  
+ === Commands ===
- ||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' ||
+ ||<bgcolor="#E5E5E5">'''Command''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
+ ||Table ||<99%>'''Table''' command loads specified table. [[BR]][[BR]]~-''Table('movieLog_table');''-~
||
+ ||Matrix ||<99%>'''Matrix''' command constructs the configuration of the logic matrix.[[BR]]'''Options'''
: features not yet. [[BR]][[BR]]~-''Matrix(table_name, columnfamily_name[, option]);''-~ ||
+ ||Substitute ||<99%>'''Substitute''' expression to [A~Z][[BR]][[BR]]~-''A = Table('movieLog_table');''-~
||
+ ||IF...ELSE ||<99%>'''IF...ELSE''', Imposes conditions on the execution. [[BR]][[BR]]~-''IF
( boolean_expression )[[BR]]B = command_statements;[[BR]]ELSE[[BR]]B = command_statements;''-~||
+ ||Store ||<99%>'''Store''' command will store results to specified table. [[BR]][[BR]]~-''A
= Table('movieLog_table'); [[BR]]B = A.Selection(length > 100); [[BR]]Store B TO table('tmp_table')[or
file('backup.dat')];''-~ ||
- ||Help ||<99%>'''Help''' command provides information about the use of shell script.[[BR]][[BR]]~-''HELP
[function_name];''-~ ||
- ||Show ||<99%>'''Show''' command lists tables ''or files (DFS)''.[[BR]][[BR]]~-''SHOW
tables[ or files];''-~ ||
- ||Describe ||'''Describe''' command provides information about the columnfamilies in a table.[[BR]][[BR]]~-''DESC
table_name;''-~ ||
- ||Create ||'''Create''' command creates a new table.[[BR]][[BR]]~-''CREATE table_name[[BR]]COLUMNFAMILIES('columnfamily_name1'[,
'columnfamily_name2', ...])[[BR]][LIMIT=limitNumber_of_Version];''-~ ||
- ||Drop ||'''Drop''' command drops columnfamilies in a table or tables.[[BR]][[BR]]~-''DROP
table_name1[, table_name2, ...] or columnfamily_name1[, columnfamily_name2, ...];''-~ ||
- ||Clear ||<99%>'''Clear''' the screen.[[BR]][[BR]]~-''CLEAR;''-~ ||
- ||Exit ||<99%>'''Exit''' from the current shell script.[[BR]][[BR]]~-''EXIT;''-~ ||
- And, Commands to manually manipulate data on more detailed parts.
- ||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' ||
- ||Insert ||<99%>'''Insert''' command inserts one row into the table with a value for
specified column in the table.[[BR]][[BR]]~-''INSERT table_name ('columnfamily_name1:column_key'[,
'columnfamily_name2:column_key', ...])[[BR]] VALUESVALUES ('entry1'[, 'entry2', ...])[[BR]]WHERE
row='row_key';''-~ ||
- ||Delete ||'''Delete''' command deletes specified rows in table. [[BR]][[BR]]~-''DELETE
table_name[[BR]]WHERE row='row_key'[[BR]][AND column='columnfamily_name:column_key'];''-~
||
- ||Select ||<99%>'''Select''' command retrieves rows from a table.[[BR]][[BR]]~-''SELECT
table_name[[BR]][WHERE row='row_key'][[BR]][AND column='columnfamily_name:column_key'];[[BR]][AND
time='Specified_Timestamp'];[[BR]][LIMIT=Number_of_Version];''-~ ||
  
- ----
- = Example Of Hbase Shell Use =
- == Create the table in a HBase ==
+ === Relational Operators ===
+ ||<bgcolor="#E5E5E5">'''Operator''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
+ ||Projection ||<99%>'''Projection''' of a relation ~+R+~, It makes a new relation
as the set that is obtained when all tuples(rows) in ~+R+~ are restricted to the set {columnfamily,,1,,,...,columnfamily,,n,,}.[[BR]][[BR]]~-''A
= Table('movieLog_table');[[BR]]B = A.Projection('year','length'); '''//π,,year.length,,(A)'''
''-~ ||
+ ||Selection ||<99%>'''Selection''' of a relation ~+R+~, It makes a new relation as
the set of specified tuples(rows) of the relation ~+R+~.[[BR]]'''Set Operations''' : ~-''OR,
AND, NOT''-~[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Selection(length >
100 AND studioName = 'Fox'); '''//σ,,length > 100.studioName='Fox',,(A)''' ''-~ ||
+ ||JOINs ||<99%>Table '''JOIN''' operations, linking and extracting data from two different
internal source.[[BR]]'''Operations''' : ~-''naturalJoin(), thetaJoin(), cartesianProduct()
''-~ [[BR]][[BR]]~-''R = Table('movieLog_table');[[BR]]S = Table('movieStar_table');[[BR]]C
= R.naturalJoin(S); '''//C = R▷◁S''' ''-~ ||
+ ||Group ||<99%>'''Group''' tuples by value of an attribute and apply aggregate function
independently to each group of tuples.[[BR]]'''Aggregate Functions''' : ~-''AVG( attribute
), SUM( attribute ), COUNT( attribute ), MIN( attribute ), MAX( attribute )''-~[[BR]][[BR]]~-''A
= Table('movieLog_table);[[BR]]B = A.Group('studioName', MIN('year')); '''//γ,,studioName.MIN(
year ),,(A)''' ''-~ ||
+ ||Sort ||<99%>'''Sort''' of tuples(rows) of R, ordered according to columnfamilies
on columnfamily-list.[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = Sort A by ('length');
'''//τ,,length,,(A)''' ''-~ ||
+ 
+ '''(ex. 1)''' Search the subject and the year of the movies which were produced by 'Fox'
company and where running time is more than 100 minutes.
+ [[BR]]~-''π ,,title.year,, (σ ,,length > 100,, (movieLog_table) ∩ σ ,,studioName
= 'Fox',, (movieLog_table))''-~
  
  {{{
- HBase > CREATE movieLog_table
-     --> COLUMNFAMILIES('year', 'length', 'inColor', 'studioName', 'vote', 'producer')
-     --> LIMIT=1; 
+ Hbase.altools > A = Table('movieLog_table'); 
+ Hbase.altools > B = A.Selection(length > 100 AND studioName = 'Fox'); 
+ Hbase.altools > C = B.Projection('year'); 
  
+ Hbase.altools > store C to table('result_table'); 
- HBase > CREATE movieStar_table
-     --> COLUMNFAMILIES('biography', 'filmography', 'gender', 'birthDate')
-     --> LIMIT=1;
  }}}
  
- == Insert data into a table ==
+ '''(ex. 2)''' Theta Join : ▷◁,,C,,
+ [[BR]]~-''movieStars_table▷◁,,actor < year,,movieLog_table''-~
+ 
  {{{
- HBase > INSERT movieLog_table ('year:', 'length:', 'inColor:', 'studioName:', 'vote:user_1',
'producer:')
-     --> VALUES ('1977', '124', 'true', 'Fox', '5', 'George Lucas')
-     --> WHERE row='Star Wars';
+ Hbase.altools > A = Table('movieStars_table'); 
+ Hbase.altools > B = Table('movieLog_table');
+ Hbase.altools > C = A.thetaJoin(B);
  
+ Hbase.altools > store C to table('result_table'); 
- 
- HBase > INSERT movieStar_table ('biography:', 'filmography:Star Wars', 'gender:', 'birthDate:')
-     --> VALUES ('blah~', 'starring', 'male', 'March 31, 1971')
-     --> WHERE row='Ewan Gordon Mc.Gregor'; 
  }}}
  
- == Show all data in a table ==
+ === Matrix Arithmetic Operators ===
+ ||<bgcolor="#E5E5E5">'''Operator''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
+ ||Addition ||<99%>'''Adding''' entries with the same indices. [[BR]][[BR]]~-''A =
Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C = A + B; '''// c,,ij,,
= a,,ij,, + b,,ij,, (i : row key, j : column key)''' ''-~ ||
+ ||Subtraction ||<99%>'''Subtracting''' entries with the same indices.[[BR]][[BR]]~-''A
= Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C = A - B; '''// c,,ij,,
= a,,ij,, - b,,ij,, (i : row key, j : column key)''' ''-~ ||
+ ||Multiplication ||<99%>'''Multiplication''' of two matrices, Product C of two matrices
A and B.[[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C
= A * B; '''//C = A · B''' ''-~ ||
+ ||Division ||<99%>'''Division''' is solving the matrix equation AX = B for X.[[BR]][[BR]]~-''A
= Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C = A /[or \] B; '''//
C = A / B''' ''-~||
+ ||Transpose ||<99%>'''Transpose''' of a Matrix, A matrix which is formed by turning
all the rows of a given matrix into columns and vice-versa.[[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B
= Transpose(A); '''// B = A'''' ''-~||
+ 
+ '''(ex. 1)''' The product C of two matrices A and B
+ [[BR]]~-''C,,ij,, = ΣA,,ik,,B,,kj,, (1 ≤ i ≤ m , 1 ≤ j ≤n)''-~
+ 
  {{{
- HBase > SELECT movieLog_table;
+ Hbase.altools > A = Matrix('m_table','cf_1');
+ Hbase.altools > B = Matrix('m_table','cf_2');
+ Hbase.altools > C = A * B;  
  }}}
  
- ||Row Key ||<-12>Column Families ||
- ||<rowbgcolor="#ececec">title ||<-2> year ||<-2>length ||<-2>inColor
||<-2> studioName ||<-2> vote ||<-2> producer ||
- ||Star Wars ||year: || 1977 ||length: || 124 ||inColor: || true ||studioName: || Fox ||
vote:''user_1'' || 5 || producer: || George Lucas ||
- || || || || || || || || || || vote:''user_2'' || 2 || || ||
- ||Mighty Ducks ||year: || 1991 ||length: || 104 ||inColor: || true ||studioName: || Disney
|| vote:''user_1'' || 2 || producer: || Blair Peters ||
- || || || || || || || || || || vote:''user_3'' || 4 || || ||
- ||Wayne's World ||year: || 1992 ||length: || 95 ||inColor: || true ||studioName: || Paramount
|| vote:''user_2'' || 3 || producer: || Penelope Spheeris ||
- || || || || || || || || || || vote:''user_3'' || 4 || || ||
+ === Factorizations and Decompositions ===
+ 
+ ||<bgcolor="#E5E5E5">'''Function''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
+ ||LU ||<99%>'''LU Decomposition'''[[BR]]A procedure for decomposing an N by N matrix
A into a product of a lower triangular matrix L and an upper triangular matrix U, LU = A.[[BR]]'''Functions'''
: ~-''getL(), getU(), isSingular(), getPivot()''-~ [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B
= LUDecomposition(A);[[BR]]C = getU(B);[[BR]]D = getL(A);''-~||
+ ||QR ||<99%>'''QR Decomposition'''[[BR]]For an m-by-n matrix A with m >= n, the
QR decomposition is an m-by-n orthogonal matrix Q and an n-by-n upper triangular matrix R
so that A = Q*R.[[BR]]'''Functions''' : ~-''getH(), getQ(), getR()''-~[[BR]][[BR]]~-''A =
Matrix('m_table','cf_1');[[BR]]B = QRDecomposition(A);[[BR]]C = getH(B);''-~||
+ ||Cholesky ||<99%>'''Cholesky Decomposition'''[[BR]]It is a special case of LU decomposition
applicable only if matrix to be decomposed is symmetric positive definite.[[BR]]'''Functions'''
: ~-''getL(), isSPD()''-~ [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = CholeskyDecomposition(A);[[BR]]C
= getL(A);''-~||
+ ||SVD ||<99%>'''SV(Singular Value) Decomposition'''[[BR]]For an m-by-n matrix A with
m >= n, the singular value decomposition is an m-by-n orthogonal matrix U, an n-by-n diagonal
matrix S, and an n-by-n orthogonal matrix V so that A = U*S*V'.[[BR]]'''Functions''' : ~-''getS(),
getU(), getV(), getSingularValues()''-~ [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B
= SVDecomposition(A);[[BR]]C = getU(B);''-~||
+ 
+ '''(ex. 1)''' To find the Singular Value decomposition in Altools, do the following:
+ [[BR]]~-''M = UΣV*''-~
  
  {{{
- HBase > SELECT movieStar_table;
+ Hbase.altools > M = Matrix('m_table','cf_1'); //Set up the matrix M from mapped matrix
in hbase.
+ Hbase.altools > U = M.getU();
+ Hbase.altools > V = M.getV();
  }}}
  
- ||Row Key ||<-8>Column Families ||
- ||<rowbgcolor="#ececec">starName ||<-2> biography ||<-2>filmography ||<-2>gender
||<-2> birthDate ||
- ||Ewan Gordon Mc.Gregor ||biography: ||blah blah ||filmography:Star Wars ||starring ||gender:
||male ||birthDate: ||March 31, 1971 ||
- || || || ||filmography:Emma ||extra || || || || ||
- ||Kenan Thompson ||biography: ||blah blah ||filmography:Mighty Ducks ||starring ||gender:
||male ||birthDate: ||May 10, 1978 ||
- || || || ||filmography:Big Fat Liar  ||cameo || || || || ||
- ||keanu reeves ||biography: ||blah blah ||filmography:Constantine ||starring ||gender: ||male
||birthDate: ||September 2, 1964||
- || || || ||filmography:The Matrix Reloaded ||starring || || || || ||
- 

Mime
View raw message