hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Trivial Update of "Hbase/ShellPlans" by udanax
Date Mon, 13 Aug 2007 10:07:23 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by udanax:
http://wiki.apache.org/lucene-hadoop/Hbase/ShellPlans

------------------------------------------------------------------------------
  [[TableOfContents(5)]]
- 
  ----
  
  = Hbase Shell Plan Draft =
@@ -9, +8 @@

  
  This project is currently in the planning stage.  [https://issues.apache.org/jira/browse/HADOOP-1608
HADOOP-1608] to add "Relational Algrebra Operators" is currently in process.
  
- == People Involved ==
-   * [:udanax:Edward Yoon], (NHN corp)
-   * [:InchulSong: Inchul Song], (Division of Computer Science, KAIST)
- 
  == Suggested Hbase Query Language plans ==
  
-  ''-- I've made some changes to your initial [wiki:Hbase/HbaseShell/HQL HQL] to make it
look more like SQL. I borrowed the syntax definition style from MySQL.[[BR]]by [:InchulSong:
Inchul Song]''
+  * ''I've made some changes to your initial [wiki:Hbase/HbaseShell/HQL HQL] to make it look
more like SQL. I borrowed the syntax definition style from MySQL. by [:InchulSong: Inchul
Song]''
  
  ----
  
@@ -36, +31 @@

  Hbase > exit;
  }}}
  
- Hbase altools is an Hbase Shell sub 'interpreter' (or 'shell)' program to provide scalable
data processing capabilities like  aggregation, algebraic calculation(groups and sets, commutative
rings, algebraic geometry, and linear algebra) on Hadoop + Hbase based parallel machines.
especially, it will focus on storing and manipulating '''sparse matrices''' on Hbase.
+ Hbase altools is an Hbase Shell sub 'interpreter' (or 'shell)' program to provide scalable
data processing capabilities like  aggregation, algebraic calculation(groups and sets, commutative
rings, algebraic geometry, and linear algebra) on Hadoop + Hbase based parallel machines.
especially, it will focus on storing and manipulating sparse matrices on Hbase.
  
-  ''-- Altools Matrix operations will show how Google search's LSI, Google Earth's algebraic
topology, Google News' recommendation system are related to Bigtable. See the HBase Shell
Usage Page. --[:Hbase/HbaseShell/Examples]''
+  ''-- Altools Matrix operations will show how Google search's LSI, Google Earth's algebraic
topology, Google News' recommendation system are related to Bigtable.''
  
+ === Background ===
- 
- === Hbase altools Goals ===
-  * A Simplified Import/Export/Migrate Functionality Between different data sources (Hadoop,
HBase)
-  * A Simplified processing of a logical data model
-  * A Simplified algebraic operations
-  * A Simplified Parallel Numerical Analysis by abstracting/numericalizing points, lines,
or plane data across multiple maps in HBase.
- 
-  ''~-Does the import/export above include being able to write HQL/altool scripts feeding
them to the interpreter on stdin or passing the interpreter a file of script? It would be
sweet too if the interpreter could be invoked with a flag which stated how results were to
be output.  ACSII tables could be the default as it is now but users will likely want output
without formatting or output formatted as XML, etc.  Something to think about.  Also, Edward,
I'd suggest that you would be doing yourself a service if you added citations for concepts
like 'Parallel Numerical Analysis'.  It will help folks like myself does not know what this
means.  Thanks. -- St.Ack -~''
- 
- === HBase altools Background ===
- I expect Hadoop + Hbase to handle sparsity and data explosion very well in near future.
Moreover, i believe the design of the multi-dimensional map structure and the 3d space model
of the data are optimized for rapid ad-hoc information retrieval in any orientation, as well
as for fast, flexible calculation and transformation of raw data based on formulaic relationships.
It is advantageous with respect to '''Analysis Processing'''  as it allows users to easily
formulate complex queries, and filter or slice data into meaningful subsets, among other things.
+ I expect Hadoop + Hbase to handle sparsity and data explosion very well in near future.
Moreover, i believe the design of the multi-dimensional map structure and the 3d space model
of the data are optimized for rapid ad-hoc information retrieval in any orientation, as well
as for fast, flexible calculation and transformation of raw data based on formulaic relationships.
It is advantageous with respect to Analysis Processing as it allows users to easily formulate
complex queries, and filter or slice data into meaningful subsets, among other things.
- 
- === Rationale ===
- 
- It will probably take a while for Hadoop + HBase to provide reliable real-time service like
other DBMS.  [[BR]]Also, Multi Dimensional Model is commonly accepted for OLAP.
- ||<bgcolor="#E5E5E5">'''System Characteristic''' ||<bgcolor="#E5E5E5">'''RDBMS'''
||<bgcolor="#E5E5E5">'''Multi-Dimensional Model Hbase''' ||
- ||Data Retrieval Perfomance ||Slow ||Fast ||
- ||Calculation Functionality || Limited, in all but one dimension ||Can be very high, all
dimensions ||
- ||Openness to live data access by other applications ||Excellent ||Limited ||
- ||Priorities ||High perfomance, High availability ||High flexibility, High user autonomy
||
- 
- Thus, I decided to develop a shell to process linear algebraic computing and large scale
data using Hadoop's parallel processing and HBase storage.
- 
- ''Then you may ask "What is a difference from MapReduce using MapFiles?"''
- 
- I don't expect it to give us a high-performance just yet, but it will sure make data management
and development much easier. First, let's take a look at HBase's data model. HBase provides
a unified data model and it represents a data in 3-dimensional - Row, Column, and TImestamp.
Also, Row and Column may be extended infinitely.
- 
- If we decide to cut the data model in time version, then we may view the new data as a 2D
table. If index is in string, we may view it as a huge map. If index is in integer, then it
is one huge 2D array. So each table may have such data storages in 3D (Columnfamilies) Locality
Group(Columnfamilies) is a relationship that can occur between multiple references whenever
one reference brings in much of the data used by the other references.
  
  ----
  
@@ -91, +60 @@

  ||Sort ||<99%>'''Sort''' of tuples(rows) of R, ordered according to columnfamilies
on columnfamily-list.[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = Sort A by ('length');
'''//τ,,length,,(A)''' ''-~ ||
  
  '''(ex. 1)''' Search the subject and the year of the movies which were produced by 'Fox'
company and where running time is more than 100 minutes.
- [[BR]]~-''π ,,title.year,, (σ ,,length > 100,, (movieLog_table)∩σ ,,studioName =
'Fox',, (movieLog_table))''-~
+ [[BR]]~-''π ,,title.year,, (σ ,,length > 100,, (movieLog_table) ∩ σ ,,studioName
= 'Fox',, (movieLog_table))''-~
  
  {{{
  Hbase.altools > A = Table('movieLog_table'); 
@@ -108, +77 @@

  Hbase.altools > A = Table('movieStars_table'); 
  Hbase.altools > B = Table('movieLog_table');
  Hbase.altools > C = A.thetaJoin(B);
+ 
+ Hbase.altools > store C to table('result_table'); 
  }}}
  
  ==== Matrix Arithmetic Operators ====
@@ -117, +88 @@

  ||Multiplication ||<99%>'''Multiplication''' of two matrices, Product C of two matrices
A and B.[[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C
= A * B; '''//C = A · B''' ''-~ ||
  ||Division ||<99%>'''Division''' is solving the matrix equation AX = B for X.[[BR]][[BR]]~-''A
= Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C = A /[or \] B; '''//
C = A / B''' ''-~||
  ||Transpose ||<99%>'''Transpose''' of a Matrix, A matrix which is formed by turning
all the rows of a given matrix into columns and vice-versa.[[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B
= Transpose(A); '''// B = A'''' ''-~||
+ 
+ '''(ex. 1)''' The product C of two matrices A and B
+ [[BR]]~-''C,,ij,, = ΣA,,ik,,B,,kj,, (1 ≤ i ≤ m , 1 ≤ j ≤n)''-~
+ 
+ {{{
+ Hbase.altools > A = Matrix('m_table','cf_1');
+ Hbase.altools > B = Matrix('m_table','cf_2');
+ Hbase.altools > C = A * B;  
+ }}}
  
  ==== Factorizations and Decompositions ====
  
@@ -157, +137 @@

  ----
  = Example Of Hbase Shell Use =
  
- See the HBase Shell Usage Page. [:Hbase/HbaseShell/Examples]
+ See the HBase Shell Full Usage Page. 
+  * [:Hbase/HbaseShell/Examples]
  

Mime
View raw message