hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "Hbase/HbaseShell" by udanax
Date Sat, 29 Dec 2007 08:51:50 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by udanax:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseShell

------------------------------------------------------------------------------
  
  ----
  = Hbase Shell Introduction =
- Hbase Shell is a query language shell for manipulating databases in Hadoop + Hbase. 
- [[BR]][:Hbase/HbaseShell/ShellPlans: Shell Plans] page for discussion and description of
future operators. 
  
- == People Involved ==
+ == Initial Contributor ==
  
   * [:udanax:Edward Yoon] (R&D center, NHN corp.) -- Initial Contributor
-  * [:InchulSong: Inchul Song] (Database Lab, KAIST) 
-  * [:JaesunHan:Jaesun Han] (NexR corp.)
-  * [:YonghoHa:Yongho Ha] (R&D center, TMAXSoft)
   
  ----
  = How to Start a Shell =
@@ -31, +26 @@

  hql >}}}
  
  All commands are terminated with a semi-colon: e.g. Type 'help;' to see list of available
commands.
- 
- == Hbase Shell Configuration ==
- 
- ||<bgcolor="#ececec">'''Property''' ||<bgcolor="#ececec">'''Description''' ||
- ||hbaseshell.jline.bell.enabled ||<99%> If true, enable audible keyboard bells if
an alert is required.||
- ||hbaseshell.formatter||<99%> Name of formatter class to use drawing result set tables.
Default: org.apache.hadoop.hbase.shell.formatter.AsciiTableFormatter||
- 
- 
- == Application Access to Hbase Shell Server ==
-  * [:Hbase/HbaseShell/PHP:PHP HQL] Connect.
-  * [http://shell.hadoop.co.kr/PHPClient.php HQL query test site] on 2 node cluster Hbase
  
  = Hbase Query Language Commands =
  
@@ -91, +75 @@

  ||JOIN ||<99%>Table '''JOIN''' operations, linking and extracting data from two different
internal source.[[BR]][[BR]]~-''R = Table('table_name1');[[BR]]S = Table('table_name2');[[BR]]C
= R.Join(condition_expression) and S;''-~ ||
  ||Group ||<99%>'''Group''' tuples by value of an attribute and apply aggregate function
independently to each group of tuples.[[BR]]'''Aggregate Functions''' : ~-''AVG(attribute),
SUM(attribute), COUNT(attribute), MIN(attribute), MAX(attribute)''-~[[BR]][[BR]]~-''A = Table('table_name');[[BR]]B
= A.Group(column-list);''-~ ||
  
- ----
- = Examples =
- == Example Of Basic Query Command Uses ==
- === Create the table in a Hbase ===
- 
- {{{
- hql > help create;
- CREATE    Create tables
- 
- Syntax:
- CREATE TABLE table_name
-         column_family_definition [, column_family_definition] ...
- 
- column_family_definition:
-         column_family_name
-         [MAX_VERSIONS=n]
-         [MAX_LENGTH=n]
-         [COMPRESSION=NONE|RECORD|BLOCK]
-         [IN_MEMORY]
-         [BLOOMFILTER=NONE|BLOOMFILTER|COUNTING_BLOOMFILTER|RETOUCHED_BLOOMFILTER VECTOR_SIZE=n
NUM_HASH=n]
- }}}
- 
- '''CREATE TABLE''' enables you to create a new table with various options for each column
family.
- 
-  * ~-'''MAX_VERSIONS''' makes a table keep only the recent n versions in a cell. Its default
value is 3, i.e., only the most recent 3 versions of values are stored in a cell.-~
-  * ~-'''MAX_LENGTH''' specifies the maximum size of a column value. By default, the maximum
size is unlimited. It is limited only by Integer.MAX_VALUE.-~
-  * ~-'''COMPRESSION''' specifies which compression technique to use for the column family.
By default, Hbase does not compress any data.-~
-  * ~-'''IN_MEMORY''' specifies whether to keep the values in the column family in-memory,
or not. By default, values are not kept in-memory.-~
-  * ~-'''BLOOMFILTER''' specifies which bloom filter to use for the column family. By default,
none of the bloom filters is used. You can specify the options in two ways: with '''VECTOR_SIZE'''
and '''NUM_HASH''', or with '''NUM_ENTRIES'''. '''VECTOR_SIZE''' specifies the number of elements
in the vector, and '''NUM_HASH''' specifies the number of hash functions to use; With '''NUM_ENTRIES''',
you  specify only the approximated number of entries in the column, and '''VECTOR_SIZE'''
and '''NUM_HASH''' are automatically determined.-~
- 
- {{{
- hql > CREATE TABLE movieLog_table (
-   --> year, length, inColor, studioName, vote, producer, actor)
-   --> NUM_VERSIONS 10; 
- 
- hql > CREATE TABLE webtable (
-   --> contents in_memory max_versions=10 compression=block,
-   --> anchor max_length=256 bloomfilter=counting_bloomfilter
-   --> vector_size=1000000 num_hash=4);
- }}}
- 
- === Select data from a table ===
- {{{
- hql > help select;
- SELECT    Select values from tables
- 
- Syntax:
- SELECT { column_name, [, column_name] ... | *} FROM table_name
-         [WHERE row='row_key' | STARTING FROM 'row-key']
-         [NUM_VERSIONS = version_count]
-         [TIMESTAMP 'timestamp']
-         [LIMIT = row_count]
-         [INTO FILE 'file_name']
- }}}
- 
- '''SELECT''' retrieves a subset of data from the specified table.
- 
-  * ~-'''STARTING FROM''' returns all the rows starting from 'row-key'.-~
-  * ~-'''NUM_VERSIONS''' retrieves only the recent n versions of values in a cell.-~
-  * ~-'''TIMESTAMP''' returns only the values with the specified timestamp.-~
-  * ~-'''LIMIT''' limits the number of rows to be returned.-~ 
- 
- {{{
- hql > SELECT studioName: FROM movieLog_table WHERE row = 'Star Wars';
- 
- +---------------------------------------------+
- | title         studioName                    |
- | =========================================== |
- | Star Wars     Fox Entertainment Group, Inc. |
- +---------------------------------------------+
- 
- Successfully print out the selected data.(0.05 sec)
- 
- hql > SELECT 'studioName:YoungGu Art, Corp.' FROM movieLog_table WHERE row = 'D-War';
- }}}
- 
- === Insert data into a table ===
- {{{
- hql > help insert;
- INSERT    Insert values into tables
- 
- Syntax:
- INSERT INTO table_name
-         (colmn_name, ...) VALUES ('value', ...)
-         WHERE row='row_key'
- 
- column_name:
-           column_family_name
-         | column_family_name:column_label_name
- }}}
- 
- '''INSERT''' inserts a set of values into a table. 
- 
-  * ~-If the specified column already exists, the new value is stored as a new version.-~
-  * ~-If '''TIMESTAMP''' is not specified for the value, the current time is used as its
timestamp.-~
- 
- {{{
- hql > INSERT INTO movieLog_table (year:, length:, inColor:, studioName:, 'vote:user name',
producer:, 'actor:hero')
-   --> VALUES ('1977', '124', 'true', 'Fox', '5', 'George Lucas', 'Mark Hamill')
-   --> WHERE row='Star Wars';
- }}}
- 
- === Delete data in a table ===
- {{{
- hql > help delete;
- DELETE    Delete table data
- 
- Syntax:
- DELETE { column_name, [, column_name] ... | COLUMNFAMILIES(column_family[, column_family]
... | *} 
-         FROM table_name
-         [WHERE row = 'row-key'];
- }}}
- 
-  * Asterisk (*) will be delete the all in table.
-  * COLUMNFAMILIES(column_family list) will be delete the all columns in columnfamily.
-  * Specified column_name will be delete the specified column data.
- 
- {{{
- hql > DELETE actor:hero FROM movieLog_table;
- hql > DELETE actor:hero FROM movieLog_table WHERE row='Star Wars';
- hql > DELETE * FROM movieLog_table;
- }}}
- 
- === How to use external jars in Hbase Shell ===
- 
- {{{
- hql > HELP JAR;
- JAR jarFile [mainClass] args...;
- ...
- 
- hql > JAR ./build/hadoop-examples.jar pi 10 10;
- }}}
- 
- == Example Of Relational Algebraic Query Command Uses ==
- 
- === Relational projection ===
- 
-  * selects a subset of the columnfamilies of a relation
-  * Result = π ,,column_list,, (Relation) 
- {{{
- Relation
- +--------------------------------------------------------------------+
- | title         year   length  inColor studioName                    |
- | ================================================================== |
- | Star Wars     1977   124     true    Fox Entertainment Group, Inc. |
- | Mighty Ducks  1991   104     true    Paramount Pictures Corp.      |
- +--------------------------------------------------------------------+
- 
- hql > Result = Relation.Projection('year','legnth');
- hql > save Result into table('result');
- 
- Result
- +------------------------------+
- | title         year   length  |
- | ============================ |
- | Star Wars     1977   124     |
- | Mighty Ducks  1991   104     |
- +------------------------------+
- }}}
- === Relational selection ===
-  * selects a subset of the rows in a relation that satisfy a selection condition
-  * Result = σ ,,selection_condition,, (Relation) 
- {{{
- Relation
- +--------------------------------------------------------------------+
- | title         year   length  inColor studioName                    |
- | ================================================================== |
- | Star Wars     1977   124     true    Fox Entertainment Group, Inc. |
- | Mighty Ducks  1991   104     true    Paramount Pictures Corp.      |
- +--------------------------------------------------------------------+
- 
- hql > Result = Relation.Selection(length > 100 and studioName = 'Fox Entertainment
Group, Inc.'); 
-    or Result = Relation.Selection(length > 100 
-       and studioName NOT IN ('Fox Entertainment Group, Inc.', 'Paramount Pictures Corp.'));
- hql > save Result into table('result');
- 
- Result
- +--------------------------------------------------------------------+
- | title         year   length  inColor studioName                    |
- | ================================================================== |
- | Star Wars     1977   124     true    Fox Entertainment Group, Inc. |
- +--------------------------------------------------------------------+
- }}}
- 
- === Relational group ===
-  * Aggregation functions on collections of data values: average, minimum, maximum, sum,
count.
-  * Group rows by value of an columnfamily and apply aggregate function independently to
each group of rows.
-  * <Grouping columnfamilies> ƒ ,,function_list,, (Relation)
- 
- {{{
- Hbase > Group Relation by (studioName, SUM('vote:user'));
- }}}
- 
- === Relational join ===
-  * The join of two relations R1(A,,1,, ,A,,2,, ,...,A,,n,,) and R2(B,,1,, ,B,,2,, ,...,B,,m,,)
is a relation with degree k=n+m and attributes (A,,1,, ,A,,2,, ,...,A,,n,, , B,,1,, ,B,,2,,
,...,B,,m,,) that satisfy the join condition
-   * Result = R1 ▷◁ ,,θ join_condition,, R2
- {{{
- R1
- +----------------------------------------------------------------------------------------------+
- | title         producer      actor                              studioName            
       |
- | ============================================================================================
|
- | Star Wars     George Lucas                                     Fox Entertainment Group,
Inc. |
- | Mighty Ducks  Blair Peters  actor:hero      <Charles Adler>    Paramount Pictures
Corp.      |
- |                             actor:cameo     <Someone>                          
             |
- | Bodyguard     Mick Jackson  actor:hero      <Kevin Costner>    Warner Home Video
Inc.        |
- |                             actor:heroine   <Whitney Houston>                  
             |
- +----------------------------------------------------------------------------------------------+
- 
- R2
- +---------------------------------------------------------+
- | name              occupation   birth         knownFor   |
- | ======================================================= |
- | Whitney Houston   Singer       Aug 9, 1963   Diva       |
- +---------------------------------------------------------+
- 
- hql > R1 = table('movieLog_table');
- hql > R2 = table('personInfo_table');
- hql > Result = R1.join(R1.producer: = R2.ROW) and R2; 
-    or Result = R1.join(R1.actor:hero = R2.Row) and R2;
-    or Result = R1.join(R1.actor:heroine = R2.Row and R1.studioName = 'Warner Home Video'
and R2.occupation = 'Singer') and R2;
- 
- Result
- +--------------------------------------------------------------------------------------------------------------------+
- | title     producer      actor                            studioName              occupation
 birth       knownFor  |
- | ==================================================================================================================
|
- | Bodyguard Mick Jackson  actor:heroine <Whitney Houston>  Warner Home Video Inc.
 Singer      Aug 9, 1963 Diva      |
- +--------------------------------------------------------------------------------------------------------------------+
- }}}
- 
-  * If we don't have an index for a domain in the join, we can still improve on the nested-loop
join using sort join. 
- 
- {{{
- R1
- +----------------------------------------------------------------------------------------------+
- | title         producer      actor                              studioName            
       |
- | ============================================================================================
|
- | Star Wars     George Lucas                                     Fox Entertainment Group,
Inc. |
- | Mighty Ducks  Blair Peters  actor:hero      <Charles Adler>    Paramount Pictures
Corp.      |
- |                             actor:cameo     <Someone>                          
             |
- | The Bodyguard Mick Jackson  actor:hero      <Kevin Costner>    Warner Home Video
Inc.        |
- |                             actor:heroine   <Whitney Houston>                  
             |
- +----------------------------------------------------------------------------------------------+
- 
- R2
- +--------------------------------------------------------------------------------------------------+
- | Symbol   corporation                     Market         Management                   
           |
- | ================================================================================================
|
- | CSCO     Cisco Systems, inc.             Market:NASDAQ  Management:CEO       <John
T. Chambers>  |
- | Fox      Fox Entertainment Group, Inc.   Market:NYSE    Management:President <Peter
F. Chernin>  |
- |                                                         Management:SEVP      <David
F. DeVoe>    |
- | ...                                                                                  
           |
- +--------------------------------------------------------------------------------------------------+
- 
- hql > R1 = table('movieLog_table');
- hql > R2 = table('stockCompany_info');
- hql > result = R1.join(R1.studioName = R2.corporation) and R2;
- }}}
- 
- === Query mixture examples ===
- 
- '''(ex. 1)''' Search the subject(row) and the year of the movies which were produced by
'Fox' company and where running time is more than 100 minutes.
- [[BR]]~-''π ,,year,, (σ ,,length > 100,, (movieLog_table) ∩ σ ,,studioName = 'Fox',,
(movieLog_table))''-~
- 
- {{{
- Relation
- +--------------------------------------------------------------------+
- | title         year   length  inColor studioName                    |
- | ================================================================== |
- | Star Wars     1977   124     true    Fox Entertainment Group, Inc. |
- | Mighty Ducks  1991   104     true    Paramount Pictures Corp.      |
- +--------------------------------------------------------------------+
- 
- hql > Proj = Relation.Projection('year');
- hql > Result = Proj.Selection(length > 100 AND studioName = 'Fox Entertainment Group,
Inc.');
- hql > save Result into table('result');
- 
- Result
- +----------------------+
- | title(row)    year   |
- | ==================== |
- | Star Wars     1977   |
- +----------------------+
- }}}
- 
- ----
- = Comments =
- 
- Please add comments related to the project below.
- 

Mime
View raw message