hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Trivial Update of "Hbase/Shell/Replacement" by stack
Date Thu, 22 May 2008 23:57:11 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by stack:

The comment on the change is:
Start on some shell replacement notes

New page:
Notes on the HQL replacement

== Required ==

At least the admin (definitional, DDL) functionality currently in HQL: SHOW (tables), DROP,
 CREATE, ALTER.  We don't need JAR (Running MR job jar from HQL cmdline), FS (hadoop fs operations
from the HQL cmdline), CLEAR (clear terminal).

At least the manipulative functionality (DML) currently in HQL: SELECT, INSERT, UPDATE, DELETE

Output formatters.  At least ascii (table) and xhtml.  JSON would be a nice-to-have.

User-friendly: 'obvious', 'natural', and lots of help (Hard to have 'fit' criteria for 'user-friendly'
but HQL being SQL-like is an example of this requirements' intent)

Read commands from STDIN, dump on STDOUT.

Dynamic language -- python, ruby, etc. -- access to full HBase API as a tool for debugging
horked hbase clusters.

== Nice to Haves ==

HBase particular operators: ONLINE/OFFLINE/MERGE

Our replacement should map closely to current client API

Easy to maintain/extend (Hard to have 'fit' criteria for the notion 'easy')

== Some Discussion ==

We might take on SQLs DDL/DML distinction (Was raised when suggested that DELETE could operate
on a cell, column, column family, row, or table depending on context).

Create table needs to take table name, table attributes -- e.g. table regionsize -- and column
families and their definitions which will include maximum versions, etc.  Attributes on tables
and column families are many and will likely evolve over time.  Shouldn't have to rev. the
shell parser for every attribute change.  Building these lengthy DDL statements can be involved
and error-prone.  Parse failures need to be non-cryptic.  Same table and column family descriptors
will be used altering table and column families.

Typing 'help', you should get a dump of all thats possible in the hbase shell.  Should also
be able to do help per command and, dependent on how we implement, do help or describe of
an object to learn what the object exposes.

Its OK that a user might mistakenly run 'select * from TABLE_WITH_1B_ROWS'.  They won't do
it a second time.  A simple search should turn up pointers out of the shell to tools of our
manufacture -- MR tools -- or to PIG/JAQL/Cascading.

View raw message