hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Trivial Update of "Hbase/ShellPlans" by udanax
Date Mon, 13 Aug 2007 02:57:55 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by udanax:
http://wiki.apache.org/lucene-hadoop/Hbase/ShellPlans

------------------------------------------------------------------------------
  ----
  
  = Hbase Shell Plan Draft =
- Plan is to significantly expand the set of shell operators.  Basic data manipulation and
data definition operators will be extended and evolved to be more SQL-like ([:Hbase/HbaseShell/HQL
HQL]).  More sophisticated manipulations to do relational and linear algebra, matrix additions,
multiplications, etc., will be added to a HBase subshell to keep the two operator types --
SQL-like vs. non-SQL -- distinct.
+ Plan is to significantly expand the set of shell operators.  Basic data manipulation and
data definition operators will be extended and evolved to be more SQL-like ([:Hbase/HbaseShell/HQL]).
 More sophisticated manipulations to do relational and linear algebra, matrix additions, multiplications,
etc., will be added to a HBase subshell to keep the two operator types -- SQL-like vs. non-SQL
-- distinct.
  
  This project is currently in the planning stage.  [https://issues.apache.org/jira/browse/HADOOP-1608
HADOOP-1608] to add "Relational Algrebra Operators" is currently in process.
  
  == People Involved ==
   * '''Syntax definition.'''
    * [:udanax:Edward Yoon], Master.[[BR]]Open Collaboration, NHN corp.
-   * Inchul Song, Ph.D. Candidate[[BR]]Database Lab[[BR]]Division of Computer Science, KAIST
+   * Inchul Song, Ph.D. Candidate[[BR]]Database Lab (Division of Computer Science, KAIST)
  
  If you have constructive ideas, please advise me. webmaster@udanax.org
  
- ''~-This page looks great. I've added comments to the below.  Please remove after you are
done with them. -- St.Ack-~''
+ == Suggested Hbase Query Language plans ==
  
- == Suggested Hbase Shell plans ==
- === Hbase Query Language ===
  I've made some changes to your initial HQL to make it look more like SQL. I borrowed the
syntax definition style from MySQL. 
+ 
   -- [:Hbase/HbaseShell/HQL] by Inchul Song
  
- ''~-if you're ready to implement them, I suggest you to open a new issue for "HQL" -- Edward-~''
+ ~-''If you're ready to implement them, I suggest you to open a new issue for "HQL" -- Edward''-~
  
  ----
  
@@ -43, +42 @@

  Hbase.altools > exit;
  Hbase > exit;
  }}}
+ 
- Hbase altools is an Hbase Shell sub 'interpreter' (or 'shell)' program to provide scalable
data processing capabilities like  aggregation, algebraic calculation(groups and sets, commutative
rings, algebraic geometry, and linear algebra) on Hadoop + Hbase based parallel machines.
especially, it will focus on storing and manipulating sparse matrices on Hbase.
+ Hbase altools is an Hbase Shell sub 'interpreter' (or 'shell)' program to provide scalable
data processing capabilities like  aggregation, algebraic calculation(groups and sets, commutative
rings, algebraic geometry, and linear algebra) on Hadoop + Hbase based parallel machines.
especially, it will focus on storing and manipulating '''sparse matrices''' on Hbase.
  
   ''-- Altools Matrix operations will show how Google search's LSI, Google Earth's algebraic
topology, Google News' recommendation system are related to Bigtable. See the HBase Shell
Usage Page. --[:Hbase/HbaseShell/Examples]''
+ 
  
  === Hbase altools Goals ===
   * A Simplified Import/Export/Migrate Functionality Between different data sources (Hadoop,
HBase)
@@ -59, +60 @@

  I expect Hadoop + Hbase to handle sparsity and data explosion very well in near future.
Moreover, i believe the design of the multi-dimensional map structure and the 3d space model
of the data are optimized for rapid ad-hoc information retrieval in any orientation, as well
as for fast, flexible calculation and transformation of raw data based on formulaic relationships.
It is advantageous with respect to '''Analysis Processing'''  as it allows users to easily
formulate complex queries, and filter or slice data into meaningful subsets, among other things.
  
  === Rationale ===
+ 
  It will probably take a while for Hadoop + HBase to provide reliable real-time service like
other DBMS.  [[BR]]Also, Multi Dimensional Model is commonly accepted for OLAP.
  ||<bgcolor="#E5E5E5">'''System Characteristic''' ||<bgcolor="#E5E5E5">'''RDBMS'''
||<bgcolor="#E5E5E5">'''Multi-Dimensional Model Hbase''' ||
  ||Data Retrieval Perfomance ||Slow ||Fast ||
@@ -73, +75 @@

  I don't expect it to give us a high-performance just yet, but it will sure make data management
and development much easier. First, let's take a look at HBase's data model. HBase provides
a unified data model and it represents a data in 3-dimensional - Row, Column, and TImestamp.
Also, Row and Column may be extended infinitely.
  
  If we decide to cut the data model in time version, then we may view the new data as a 2D
table. If index is in string, we may view it as a huge map. If index is in integer, then it
is one huge 2D array. So each table may have such data storages in 3D (Columnfamilies) Locality
Group(Columnfamilies) is a relationship that can occur between multiple references whenever
one reference brings in much of the data used by the other references.
- 
- ''~-I think people may also start to ask as your operators evolve: 'What is the difference
between HBase Shell and Yahoo! PIG?' -- St.Ack-~''
  
  ----
  
@@ -95, +95 @@

  ||<bgcolor="#E5E5E5">'''Operator''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
  ||Projection ||<99%>'''Projection''' of a relation ~+R+~, It makes a new relation
as the set that is obtained when all tuples(rows) in ~+R+~ are restricted to the set {columnfamily,,1,,,...,columnfamily,,n,,}.[[BR]][[BR]]~-''A
= Table('movieLog_table');[[BR]]B = A.Projection('year','length');''-~ ||
  ||Selection ||<99%>'''Selection''' of a relation ~+R+~, It makes a new relation as
the set of specified tuples(rows) of the relation ~+R+~[[BR]]'''Set Operations''' : ~-''OR,
AND, NOT''-~[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Selection(length >
100 AND studioName = 'Fox');''-~ ||
+ ||JOINs ||<99%>Table '''JOIN''' operations, linking and extracting data from two different
internal source[[BR]]'''Operations''' : ~-''naturalJoin(), thetaJoin(), cartesianProduct()
''-~ [[BR]][[BR]]~-''R = Table('movieLog_table');[[BR]]S = Table('movieStar_table');[[BR]]C
= R.naturalJoin(S); //C = R▷◁S''-~ ||
  ||Group ||<99%>'''Group''' tuples by value of an attribute and apply aggregate function
independently to each group of tuples.[[BR]]'''Aggregate Functions''' : ~-''AVG( attribute
), SUM( attribute ), COUNT( attribute ), MIN( attribute ), MAX( attribute )''-~[[BR]][[BR]]~-''A
= Table('movieLog_table);[[BR]]B = A.Group('studioName', MIN('year'));''-~ ||
  ||Sort ||<99%>'''Sort''' of tuples(rows) of R, ordered according to columnfamilies
on columnfamily-list[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = Sort A by ('length');''-~
||
  
@@ -142, +143 @@

  St.Ack
  }}}
  
+ 
+ 
  ----
  = Example Of Hbase Shell Use =
  

Mime
View raw message