hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "Hbase/HbaseShell/HQL" by InchulSong
Date Thu, 09 Aug 2007 03:25:20 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by InchulSong:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseShell/HQL

The comment on the change is:
refinements for mainly stack's suggestions.

------------------------------------------------------------------------------
  
  We borrowed the syntax definition style from MySQL.
  
- ''~-This page looks excellent.  I've added a few minor comments.  Please remove when done
with them. -- St.Ack-~''
+ ''Thanks to Edward Yoon for his initial idea of HQL, and to Stack for his valuable suggestions.''
+ 
+ ''Any suggestions for HQL to icsong@gmail.com or in this wiki page''
  
  == Data Definition Statements ==
- 
  === CREATE TABLE Syntax ===
  CREATE TABLE enables you to create a new table and set various options for each column family.
  
  {{{
- # Simple version 
- CREATE TABLE table_name
+ CREATE TABLE table_name (
-   (column_family_name MAX_VERSIONS=n [, column_family_name MAX_VERSIONS=n] ...)
+   column_family_name [MAX_VERSIONS=n],
+   ...
+ )
  }}}
  
+  * MAX_VERSIONS is for the management of versioned data. MAX_VERSIONS makes a table keep
only the recent n versions in a cell under a column family. Its default value is 1, i.e.,
if MAX_VERSIONS is not specified, Hbase keeps only the latest version of value in a cell.
- MAX_VERSIONS is for the management of versioned data. 
- MAX_VERSIONS makes a table keep only the recent n versions in a cell under a column family.

- Its default value is 1, i.e., if MAX_VERSIONS is not specified, Hbase keeps 
- only the latest version of value in a cell.
  
  {{{
- # Full version
- CREATE TABLE table_name
+ CREATE TABLE table_name (
-   (column_family_spec [, column_family_spec] ...)
+   column_family_spec,
+   ...
+ )
  
  colum_family_spec:
-   column_family_name [MAX_VERSIONS=n] [COMPRESSION=no | block | record] 
+   column_family_name [MAX_VERSIONS=n] [LENGTH=n] 
+     [COMPRESSION=no | block | record] [IN_MEMORY] 
-     [IN_MEMORY] [MAX_LENGTH=n] [BLOOMFILTER=bloom| counting | retouched]
+     [BLOOMFILTER=bloom | counting | retouched]
  }}}
  
- See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HColumnDescriptor.html
   HColumnDescriptor API] for more information.
+  * Full version of CREATE TABLE. See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/HColumnDescriptor.html
   HColumnDescriptor API] for more information.
- 
- === SHOW TABLES Syntax ===
- SHOW TABLES shows all available tables.
- 
- {{{
- SHOW TABLES
- }}}
- 
  
  === DROP TABLE Syntax ===
  DROP TABLE removes one or more tables. 
@@ -65, +58 @@

    alter_spec [, alter_spec] ...
  
  alter_spec: 
-     ADD column_family_name
+     ADD column_family_spec
+   | ADD (column_family_spec, ...)
+   | DROP column_family_name
-   | ADD (column_family_name [, column_family_name] ...)
+   | CHANGE column_family_name column_family_spec
-   | DROP column_family_name # not supported yet
-   | CHANGE old_column_family_name new_column_family_name # not supported yet
  }}}
  
  == Data Manipulation Statements ==
@@ -76, +69 @@

  SELECT enables you to retrieve a subset of data in a table.
  
  {{{
- SELECT { column_name [, column_name] ... | * }
+ SELECT { column_name, ... | * }
    FROM table_name
-   [WHERE row = 'row-key' | STARTING 'row-key']
+   [WHERE row = 'row-key' | STARTING FROM 'row-key']
-   [NUM_VERSIONS=n] [TIMESTAMP 'timestamp']
+   [NUM_VERSIONS = version_count ] 
+   [TIMESTAMP 'timestamp'] 
+   [LIMIT = row_count]
+   [INTO OUTFILE 'file_name' export_options]
  
  column_name: 
      column_family_name:column_label_name
    | column_family_name:
  }}}
  
- You should quote column_name with single quotes if column_name has spaces in it.
+  * You should quote column_name with single quotes if column_name has spaces in it.
+  
+  * If you specify only column_family_name part for a column, you get values from all the
column_label_names in the column_family_name.
  
- If you specify only column_family_name part for a column, you get values from all the column_label_names
in the column_family_name.
+  * STARTING FROM returns all the rows starting from 'row-key'.
  
- STARTING returns all the rows starting at 'row-key'.
+  * NUM_VERSIONS retrieves only the recent n versions of values in a cell. 
  
- NUM_VERSIONS retrieves only the recent n versions of values in a cell. 
+  * TIMESTAMP returns only the values with the specified timestamp. 
  
- TIMESTAMP returns only the values with the specified timestamp. 
+  * LIMIT limits the number of rows to be returned. 
  
+  * INTO OUTFILE outputs the returned rows into a specified file. See LOAD DATA INFILE below
for export_options. For inserting into another table, see INSERT INTO SELECT below.
  
- ''~- I'd suggest you add LIMIT as in LIMIT=20 returns twenty rows only.  Otherwise, when
the database has many rows, the users screen will be overwhelmed by returns.  You could use
the new filter mechanism [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/filter/package-summary.html
filters], in particular the StopRowFilter implementing LIMIT.  How about an INTO so you can
select INTO another table or INTO a file? -- St.Ack-~''
+ ''~- "add some row filtering here, i.g., regex match or upper limit on rows returned." -~''
by Stack. See issue [https://issues.apache.org/jira/browse/HADOOP-1611 HADOOP-1611]. 
  
  === INSERT Syntax ===
  INSERT inserts a set of values into a table. 
  
  {{{
- INSERT INTO table_name (colmn_name [, column_name] ...)
+ INSERT INTO table_name (colmn_name, ...)
-   VALUES ('value' [, 'value'] ...)
+   VALUES ('value', ...)
    WHERE row = 'row-key'
    [TIMESTAMP 'timestamp']
  }}}
  
- If a specified column already exists, the specified value for the column is stored as a
new version. 
+  * If a specified column already exists, the specified value for the column is stored as
a new version. 
  
- If TIMESTAMP is not specified, the current time is used as the value of the timestamp key.
+  * If TIMESTAMP is not specified, the current time is used as the value of the timestamp
key.
+ 
+ {{{
+ INSERT INTO table_name (colmn_name, ...)
+   [TIMESTAMP 'timestamp']
+   SELECT ...
+ }}}
+ 
+  * TIMESTAMP inserts the rows selected by SELECT ... with the specified timestamp 
  
  === DELETE Syntax ===
  DELETE removes a subset of data from a table. 
@@ -122, +129 @@

    WHERE row = 'row-key'
  }}}
  
+ === LOAD DATA INFILE Syntax ===
+ LOAD DATA INFILE reads rows from a file and inserts these rows into a table. 
+ 
+ {{{
+ LOAD DATA INFILE 'file_name'
+   INTO TABLE table_name
+   [export_options]
+ 
+ export_options:
+   [FIELDS
+     [TERMINATED BY 'string']
+     [ENCLOSED BY 'char']
+   ]
+   [LINES
+     [STARTING BY 'string']
+     [TERMINATED BY 'string']
+   ]
+ }}}
+ 
+  * LOAD DATA INFILE is complement of SELECT INTO OUTFILE.
+ 
+  * export_options tells LOAD DATA INFILE how to parse the input file, such as how each field
or line is separated from each other. 
+ 
  === START TRANSACTION, COMMIT, and ROLLBACK Syntax ===
  You can group togather a sequence of data manipulation statements in a single-row transaction.
  
@@ -131, +161 @@

  ROLLBACK 
  }}}
  
- The START TRANSACTION and BEGIN statements begin a new single-row transaction
+  * The START TRANSACTION and BEGIN statements begin a new single-row transaction under a
'row-key' of table_name. 
- under a 'row-key' of table_name. 
  
+  * COMMIT commits the current transaction, making its changes permanent. If timestamp is
specified on commit, all the modifications under a single-row transaction are stored with
the specified timestamp. If not, they are stored with the current time.
- COMMIT commits the current transaction, making its changes permanent. 
- If timestamp is specified on commit, all the modifications under a single-row transaction
- are stored with the specified timestamp. If not, they are stored with the current time.
  
- ROLLBACK rolls back the current transaction, canceling its changes. 
+  * ROLLBACK rolls back the current transaction, canceling its changes. 
  
+  * By default, for every statement execution that updates a table, Hbase immediately stores
the update on disk.
- By default, for every statement execution that updates a table, 
- Hbase immediately stores the update on disk.
- 
  
  ''~- TRANSACTION on a row-level only -- and this is all you could guarantee in HBase --
may be a bit-over-the-top and require more effort than its worth.  How about implementing
this one last, if it is needed at all? -- St.Ack-~''
  
+ == Other Statements ==
+ SHOW TABLES shows all available tables.
+ 
+ {{{
+ SHOW TABLES 
+ }}}
+ 
+ DESCRIBE shows the structure of a table, including available column families and their settings
such as compression options, bloom filters, and so on. 
+ 
+ {{{
+ DESCRIBE table_name 
+ }}}
+ 
+ {{{
+ ENABLE | DISABLE table_name
+ }}}
+ 

Mime
View raw message