hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "Hbase/ShellPlans" by udanax
Date Fri, 03 Aug 2007 07:42:31 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by udanax:
http://wiki.apache.org/lucene-hadoop/Hbase/ShellPlans

------------------------------------------------------------------------------
  [[TableOfContents(4)]]
  
  ----
- = Introduction =
- Hbase Shell is an 'interpreter' (or 'shell)' to provide scalable data processing capabilities
like 
- aggregation, algebraic calculation on Hadoop + Hbase. 
  
+ I suggest to develop HBase Shell in SQL-style, and develop algebraic tools as a sub shell
as described below. 
+ 
+ {{{
+ HBase > altools;
+ 
+ Hbase altools, 0.0.1 version
+ Type 'help;' for Hbase altools usage.
+ 
+ Hbase.altools > who are you;
+ 
+  Hadoop + Hbase based algebraic manipulation tools
+ 
+ Hbase.altools > exit;
+ Hbase > eixt;
+ }}}
+ 
+ 
+ 
+ 
+ = Altools Introduction =
+ 
+ Hbase altools is an Hbase Shell sub 'interpreter' (or 'shell)' program to provide scalable
data processing capabilities like 
+ aggregation, algebraic calculation(groups and sets, commutative rings, algebraic geometry,
and linear algebra) on Hadoop + Hbase based parallel machines.
+ 
- = Hbase Shell Goals =
+ = Hbase altools Goals =
   * A Simplified Import/Export/Migrate Functionality Between different data sources (Hadoop,
HBase)
   * A Simplified processing of a logical data model
   * A Simplified algebraic operations
   * A Simplified Parallel Numerical Analysis by abstracting/numericalizing points, lines,
or plane data across multiple maps in HBase.
  
- == HBase Shell Background ==
+ == HBase altools Background ==
  
  I expect Hadoop + Hbase to handle sparsity and data explosion very well in near future.
Moreover, i believe the design of the multi-dimensional map structure and the 3d space model
of the data are optimized for rapid ad-hoc information retrieval in any orientation, as well
as for fast, flexible calculation and transformation of raw data based on formulaic relationships.
It is advantageous with respect to '''Analysis Processing''' 
  as it allows users to easily formulate complex queries, and filter or slice data into meaningful
subsets, among other things.
@@ -52, +73 @@

  If we decide to cut the data model in time version, then we may view the new data as a 2D
table.
  If index is in string, we may view it as a huge map. If index is in integer, then it is
one huge 2D array.
  
- So each table may have such data storages in 3D (ColumnFamilies)
+ So each table may have such data storages in 3D (Columnfamilies)
  Locality Group(Columnfamilies) is a relationship that can occur between multiple references
  whenever one reference brings in much of the data used by the other references.
  
  ----
+ 
- = Suggested Future Hbase Shell Operators =
+ = Suggested Hbase altools Operators =
  '''Note''' that Data should be located by their row, column, and timestamp.
  
  == Commands ==
@@ -69, +91 @@

  
  ||<bgcolor="#ececec">'''Operator''' ||<bgcolor="#ececec">'''Explanation''' ||
  ||Projection ||<99%>'''Projection''' of a relation ~+R+~, It makes a new relation
as the set that is obtained when all tuples(rows) in ~+R+~ are restricted to the set {columnfamily,,1,,,...,columnfamily,,n,,}.[[BR]][[BR]]~-''A
= Table('movieLog_table');[[BR]]B = A.Projection('year','length');''-~||
- ||Selection ||<99%>'''Selection''' of a relation ~+R+~, It makes a new relation as
the set of specified tuples(rows) of the relation ~+R+~[[BR]]'''Set Operations''' : ~-''OR,
AND, NOT''-~[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Selection(length >
100);[[BR]]C = A.Selection(length > 100 AND studioName = 'Fox');''-~||
+ ||Selection ||<99%>'''Selection''' of a relation ~+R+~, It makes a new relation as
the set of specified tuples(rows) of the relation ~+R+~[[BR]]'''Set Operations''' : ~-''OR,
AND, NOT''-~[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Selection(length >
100 AND studioName = 'Fox');''-~||
  ||Group ||<99%>'''Group''' tuples by value of an attribute and apply aggregate function
independently to each group of tuples.[[BR]]'''Aggregate Functions''' : ~-''AVG( attribute
), SUM( attribute ), COUNT( attribute ), MIN( attribute ), MAX( attribute )''-~[[BR]][[BR]]~-''A
= Table('movieLog_table);[[BR]]B = A.Group('studioName', MIN('year'));''-~||
  ||Sort ||<99%>'''Sort''' of tuples(rows) of R, ordered according to columnfamilies
on columnfamily-list[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = Sort by ('length');''-~||
  

Mime
View raw message