hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Trivial Update of "HbaseShell" by udanax
Date Wed, 27 Jun 2007 09:55:47 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by udanax:
http://wiki.apache.org/lucene-hadoop/HbaseShell

------------------------------------------------------------------------------
- '''work in progress'''
+ '''research/work in progress''' - https://issues.apache.org/jira/browse/HADOOP-1375
  
  [[TableOfContents(4)]]
+ 
  ----
  = Hbase Shell Introduction =
- 
- Hbase Shell is an 'interpreter' (or 'shell)' to provide scalable data processing capabilities
like  
+ Hbase Shell is an 'interpreter' (or 'shell)' to provide scalable data processing capabilities
like [[BR]]aggregation, algebraic calculation on Hadoop + Hbase.
- [[BR]]aggregation, algebraic calculation on Hadoop + Hbase.
  
  == Hbase Shell Goals ==
- 
  HBase Shell is developed to achieve the following goals.
  
-  * Generic Query Model Functions
   * A Simplified Import/Export/Migrate Functionality Between different data sources (Hadoop,
HBase)
   * A Simplified processing of a logical data model
   * A Simplified algebraic operations
-  * Parallel Numerical Analysis by abstracting/numericalizing points, lines, or plane data
across multiple maps in HBase.
+  * A Simplified Parallel Numerical Analysis by abstracting/numericalizing points, lines,
[[BR]]or plane data across multiple maps in HBase.
  
  == Background ==
+ 
  I expect Hadoop + Hbase to handle sparsity and data explosion very well in near future.
[[BR]]Moreover, i believe the design of the multi-dimensional structure and the 3-dim space
model of the data are [[BR]]optimized for rapid ad-hoc information retrieval in any orientation,
as well as for fast, flexible calculation and transformation of [[BR]]raw data based on formulaic
relationships.
  
- Then, I thought it would require a more user-friendly interface to enable querying the data
interactive. 
+ Then, I thought it would require a more user-friendly interface to enable querying the data
interactive.
  
  == Rationale ==
- ...
+ 
+ It will probably take a while for Hadoop + HBase to provide reliable real-time service like
other DBMS. 
+ [[BR]]Thus, I decided to develop a shell to process linear algebraic computing 
+ [[BR]]and large scale data using Hadoop's parallel processing and HBase storage. 
+ 
+ ''Then you may ask "What is a difference from MapReduce using MapFiles?"''
+ 
+ I don't expect it to give us a high-performance just yet, 
+ [[BR]]but it will sure make data management and development much easier. 
+ [[BR]]First, let's take a look at HBase's data model. 
+ 
+ HBase provides a unified data model and it represents a data in 3-dimensional 
+ [[BR]]- Row, Column, and TImestamp. Also, Row and Column may be extended infinitely. 
+   
+ If we decide to cut the data model in time version, then we may view the new data as a 2D
table. 
+ [[BR]]If index is in string, we may view it as a huge map. If index is in integer, then
it is one huge 2D array. 
+ [[BR]]So each table may have such data storages in 3D (ColumnFamilies)
+ 
  
  ----
  = Hbase Shell Syntax Definition =
- 
  '''Note''' that Data should be located by their row, column, and timestamp.
  
  == Basic Commands ==
- 
- ||<#ececec> '''Command''' ||<#ececec> '''Explanation''' ||
+ ||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' ||
  ||HELP ||<99%>'''Help''' command provides information about the use of shell script.[[BR]][[BR]]~-''HELP
[function_name];''-~ ||
  ||SHOW ||<99%>'''Show''' command will list the tables.[[BR]][[BR]]~-''SHOW tables;''-~
||
  ||DESC ||'''Desc''' command will provides information about the columnfamilies in a table.[[BR]][[BR]]~-''DESC
table_name;''-~ ||
  ||CREATE ||'''Create''' command will create a new table.[[BR]][[BR]]~-''CREATE table_name[[BR]]COLUMNFAMILIES('columnfamily_name1'[,
'columnfamily_name2', ...])[[BR]]LIMIT=limitNumber_of_Version;''-~ ||
  ||DROP ||'''Drop''' command will droping columnfamilies in a table or tables.[[BR]][[BR]]~-''DROP
table_name1[, table_name2, ...] or columnfamily_name1[, columnfamily_name2, ...];''-~ ||
+ ||SUBSTITUTE[[BR]] || '''Substitute''' query to [A~Z][[BR]][[BR]]~-''X = SELECT table_name;''-~||
- ||PRINT ||'''Print''' command will print a results to the console output. [[BR]][[BR]]~-''A
= array([1, 2, 3]);[[BR]]PRINT A;[[BR]]B = SELECT table_name WHERE row="row_key";[[BR]]PRINT
B;''-~||
+ ||PRINT ||'''Print''' command will print a results to the console output. [[BR]][[BR]]~-''A
= array([1, 2, 3]);[[BR]]PRINT A;[[BR]]B = SELECT table_name WHERE row='row_key';[[BR]]PRINT
B;''-~ ||
- ||STORE ||'''STORE''' command will store results to specified table. [[BR]][[BR]]~-''M =
matrix('table_name','columnfamily_name');[[BR]]A = array([[1, 2],[3, 4]]);  //In this case,
Key should be an integer index. [[BR]]STORE A TO M run_style;[[BR]]B = SELECT table_name WHERE
row="row_key";[[BR]]STORE B TO ('table_name','columnfamily_name1'[, 'columnfamily_name2'])
run_style;''-~||
+ ||STORE ||'''STORE''' command will store results to specified table. [[BR]][[BR]]~-''M =
matrix('table_name','columnfamily_name');[[BR]]A = array([[1, 2],[3, 4]]); //In this case,
Key should be an integer index. [[BR]]STORE A TO M run_style;[[BR]]B = SELECT table_name WHERE
row='row_key';[[BR]]STORE B TO ('table_name','columnfamily_name1'[, 'columnfamily_name2'])
run_style;''-~ ||
  ||EXIT ||<99%>'''Exit''' from the current shell script.[[BR]][[BR]]~-''EXIT;''-~ ||
- 
  And, Commands to manually manipulate data on more detailed parts.
- 
- ||<#ececec> '''Command''' ||<#ececec> '''Explanation''' ||
+ ||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' ||
- ||INSERT ||<99%>'''Insert''' command will insert one row into the table with a value
for specified column in the table.[[BR]][[BR]]~-''INSERT table_name[[BR]] VALUES('columnfamily_name:column_key','entry')[[BR]]WHERE
row="row_key";''-~ ||
+ ||INSERT ||<99%>'''Insert''' command will insert one row into the table with a value
for specified column in the table.[[BR]][[BR]]~-''INSERT table_name[[BR]] VALUES('columnfamily_name:column_key','entry')[[BR]]WHERE
row='row_key';''-~ ||
- ||SET ||'''SET''' command will change the values. [[BR]][[BR]]~-''SET table_name[[BR]] VALUES('columnfamily_name:column_key','entry')[[BR]]WHERE
row="row_key" AND time="Specified_Timestamp";''-~||
+ ||SET ||'''SET''' command will change the values. [[BR]][[BR]]~-''SET table_name[[BR]] VALUES('columnfamily_name:column_key','entry')[[BR]]WHERE
row='row_key' AND time='Specified_Timestamp';''-~ ||
- ||DELETE ||'''Delete''' command will delete specified rows in table. [[BR]][[BR]]~-''DELETE
table_name[[BR]]WHERE row="row_key"[[BR]][AND column="columnfamily_name:column_key"];''-~||
+ ||DELETE ||'''Delete''' command will delete specified rows in table. [[BR]][[BR]]~-''DELETE
table_name[[BR]]WHERE row='row_key'[[BR]][AND column='columnfamily_name:column_key'];''-~
||
  
  === Relational Algebra Operators ===
- 
- ||<#ececec> '''Command''' ||<#ececec> '''Explanation''' ||
+ ||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' ||
- ||SELECT ||<99%>'''Select''' command will retrieves rows from a table.[[BR]][[BR]]~-''SELECT
table_name[[BR]][WHERE row="row_key"][[BR]][AND column="columnfamily_name:column_key"];[[BR]][AND
time="Specified_Timestamp"];[[BR]][LIMIT=Number_of_Version];''-~ ||
+ ||SELECT ||<99%>'''Select''' command will retrieves rows from a table.[[BR]][[BR]]~-''SELECT
table_name[[BR]][WHERE row='row_key'][[BR]][AND column='columnfamily_name:column_key'];[[BR]][AND
time='Specified_Timestamp'];[[BR]][LIMIT=Number_of_Version];''-~ ||
  
  
+ 
+ 
+ 
+ === Aggregation Functions ===
+ Generic one dimensional counting??
+ ||<bgcolor="#ececec">'''Functions''' ||<bgcolor="#ececec">'''Explanation'''
||
+ ||SUM ||<99%>'''SUM''' command will retrieves rows from a table.[[BR]][[BR]]~-''SELECT
table_name[[BR]][WHERE row='row_key'][[BR]][AND column='columnfamily_name:column_key'];[[BR]][AND
time='Specified_Timestamp'];[[BR]][LIMIT=Number_of_Version];''-~ ||
+ 
+ 
+ ...
+ ||<bgcolor="#ececec">'''Function''' ||<bgcolor="#ececec">'''Explanation''' ||
+ ||... ||<99%>... ||
+ 
+ 
+ The Matrix commands are used to store a 2D array of numerical data values. [[BR]]A number
of routines are provided to manipulate the matrix object directly, illustrated below by simple
examples.
+ 
+ '''Note''' that vectors should be defined as two-dimensional matrices to distinguish between
row and column vectors [[BR]]in order to be able to perform matrix operations consistently.
+ 
+ === Matrix Construction Functions ===
+ ..
+ 
+ === Matrix Algebra Functions ===
+ ..
+ 
+ === Special functions ===
+ ..
+ 
+ ----
+ = Example Of Hbase Shell Use =
+ == Basic Usage ==
+ 
+ {{{
+ Hbase > CREATE movieLog_table 
+     --> COLUMNFAMILIES('year','length','inColor','studioName',vote','producer') 
+     --> limit=10;
+ 
+ }}}
  
  '''movieLog_table'''
  ||Row Key ||<-12>Column Families ||
  ||<rowbgcolor="#ececec">title   ||<-2> year ||<-2>length ||<-2>inColor
||<-2> studioName ||<-2> vote ||<-2> producer ||
- ||Star Wars ||year: || 1977 ||length: || 124 ||inColor: || true ||studioName: || Fox ||
vote:''user_1'' || 5 || producer: || Rick McCallum ||
+ ||Star Wars ||year: || 1977 ||length: || 124 ||inColor: || true ||studioName: || Fox ||
vote:''user_1'' || 5 || producer: || George Lucas ||
  || || || || || || || || || || vote:''user_2'' || 2 || || ||
- ||Mighty Ducks ||year: || 1991 ||length: || 104 ||inColor: || true ||studioName: || Disney
|| vote:''user_1'' || 2 || producer: || Doug Claybourne ||
+ ||Mighty Ducks ||year: || 1991 ||length: || 104 ||inColor: || true ||studioName: || Disney
|| vote:''user_1'' || 2 || producer: || Blair Peters ||
  || || || || || || || || || || vote:''user_3'' || 4 || || ||
- ||Wayne's World ||year: || 1992 ||length: || 95 ||inColor: || true ||studioName: || Paramount
|| vote:''user_2'' || 3 || producer: || Tom Keifer ||
+ ||Wayne's World ||year: || 1992 ||length: || 95 ||inColor: || true ||studioName: || Paramount
|| vote:''user_2'' || 3 || producer: || Penelope Spheeris ||
  || || || || || || || || || || vote:''user_3'' || 4 || || ||
  
+ 
+ == Relation Algebra Operations ==
+ 
  '''Projection'''
  
  [http://mirror.udanax.org/~udanax/rsync1/blog_udanax_org/udanax/280/o_ex2.gif]
  
+ {{{
+ Hbase > A = SELECT movieLog_table;
+     --> B = A.Projection('year','length');
+ 
+ Hbase > PRINT B;
+ }}}
+ 
+ ||<rowbgcolor="#ececec">title ||year ||length ||
+ ||Star Wars ||1977 ||124 ||
+ ||Mighty Ducks ||1991 ||104 ||
+ ||Wayne's World ||1992 ||95 ||
+ 
+ 
+ 
  '''Selection'''
  
  [http://mirror.udanax.org/~udanax/rsync1/blog_udanax_org/udanax/280/o_ex3.gif]
  
+ {{{
+ Hbase > A = SELECT movieLog_table 
+     --> WHERE column='studioName:Fox';
+ Hbase > B = A.Filter by "length" > 100;
+ 
+ Hbase > PRINT B;
+ }}}
+ 
+ ||<rowbgcolor="#ececec">title ||year ||length ||inColor ||studioName ||producer ||
+ ||Star Wars ||1977 ||124 ||true ||Fox ||12345 ||
+ ||Mighty Ducks ||1991 ||104 ||true ||Disney ||67890 ||
+ 
+ 
  '''Example'''
  
  [http://mirror.udanax.org/~udanax/rsync1/blog_udanax_org/udanax/280/o_ex4.gif]
  
+ {{{
+ Hbase > A = SELECT movieLog_table 
+     --> WHERE column='studioName:Fox';
+ Hbase > B = A.Filter by "length" > 100;
+ Hbase > C = B.Projection('year');
  
+ Hbase > PRINT C;
+ }}}
+ 
+ == Matrix Operations ==
+ {{{
- A = matrix(movieLog_table, vote);
+ Hbase > A = matrix('movieLog_table', 'vote');
+ 
+ Hbase > PRINT A;
+ }}}
  
  ||<rowbgcolor="#ececec"> ||user_1 ||user_2 ||user_3 ||
  ||<bgcolor="#ececec">Star Wars || 5 || 2 || 0 ||
  ||<bgcolor="#ececec">Mighty Ducks || 2 || 0 || 4 ||
  ||<bgcolor="#ececec">Wayne's World || 0 || 3 || 4 ||
  
- 
- writing..
- 
- 
- === Aggregation Functions ===
- 
- Generic one dimensional counting??
- 
- ||<#ececec> '''Functions''' ||<#ececec> '''Explanation''' ||
- ||SUM ||<99%>'''SUM''' command will retrieves rows from a table.[[BR]][[BR]]~-''SELECT
table_name[[BR]][WHERE row="row_key"][[BR]][AND column="columnfamily_name:column_key"];[[BR]][AND
time="Specified_Timestamp"];[[BR]][LIMIT=Number_of_Version];''-~ ||
- 
- ...
- 
- ||<#ececec> '''Function''' ||<#ececec> '''Explanation''' ||
- ||... ||<99%>... ||
- 
- The Matrix commands are used to store a 2D array of numerical data values.
- [[BR]]A number of routines are provided to manipulate the matrix object directly, illustrated
below by simple examples.
- 
- '''Note''' that vectors should be defined as two-dimensional matrices to distinguish between
row and column vectors 
- [[BR]]in order to be able to perform matrix operations consistently. 
- 
- === Matrix Construction Functions ===
- ..
- === Matrix Algebra Functions ===
- ..
- === Special functions ===
- ..
- 
- ----
- = Example Of Hbase Shell Use =
- ..
- == Basic Usage ==
- ..
- == Relation Algebra Operations ==
- ..
- == Matrix Operations ==
- ..
- 
  ----
  = Matrix Extension Example On Hbase Shell =
- ..
  == Latent Semantic Analysis By Singular Value Decomposition ==
- ..
+ '''Motivation'''
+ Lexical matching at term level inaccurate (claimed)
+ 
+   * Polysemy - words with number of ‘meanings’ - term matching returns irrelevant documents
- impacts precision
+   * Synonomy - number of words with same ‘meaning’ - term matching misses relevant documents
- impacts recall
+ 
+ LSA assumes that there exists a LATENT structure in word usage - obscured by variability
in word choice 
+ [[BR]]Analogous to signal + additive noise model in signal processing
+ 
+ 
+ 
- == Scalable  Collaborative Filtering With A Large User-By-Item Matrix ==
+ == Scalable Collaborative Filtering With A Large User-By-Item Matrix ==
  ..
+ 
  == Consistency Assessment Of Topological Relationship By Matrix-Union ==
- .. 
+ ..
  ----
  = People Involved =
  

Mime
View raw message