hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1608) [Hbase Shell] Relational Algrebra Operators
Date Mon, 05 Nov 2007 22:04:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540287

stack commented on HADOOP-1608:


I do not understand what you mean by the following in your 02/Nov/07 06:09 PM comment: "I'll
implement relational algebra operators to the tentative language and HQL by Sub-Tasks List

In org.apache.hadoop.hbase.shell.algebra, test for output table presence and creation if missing
is duplicated code in Selection, Projection, DuplicateTable, etc. I tried to move this duplicated
code back up into the RelationalOperation class as a utility but noticed then if table already
exists, we don't call initJob because we return early  (See end of the getConf in DuplicateTable
for example).  Is running one of these operators a second time, after table has been created,
a problem? Have you tried it?

I did not have a mapreduce cluster running and so the last line hung for ever (You should
add to help need for a mapreduce cluster).
Hbase> aaa = table('y');
Hbase> aaa;
Syntax error : Type 'help;' for usage.Message : Encountered ";" at line 1, column 5.
Hbase> show aaa;
Missing parameters. Please check 'Show' syntax
Hbase> bbb = group aaa by ('x');   Hbase> save bbb into table ('a');

Note, it would be nice if you entered a variable name only if you got a description of the
variable content or perhaps doing 'show VARIABLE_NAME' output some kind of description.

I then started up a cluster and did the simplest of operations:
Hbase> aaa = table('x');
Hbase> save aaa into table(aaaaaa);
07/11/05 21:26:57 WARN mapred.JobClient: No job jar file set.  User classes may not be found.
See JobConf(Class) or JobConf#setJar(String).Job job_200711052125_0001 is still running........Job

Job failed because of 'Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.mapred.TableSplit'.
 How do you run on a cluster Edward?  Do you copy the hbase jar into the hadoop lib dir all
over the cluster?  Does HADOOP-1622 help here?

On job failure, should the table be removed?  Currently, I have a new table 'aaaaaa' with
nothing in it.

The output of Group, Selection, etc. is saved to an 'output' table?

It would be good if we could get into the help listing somewhere the operators and types of
conditions allowed on selection (>, <, AND, OR, etc.).

One last thing, hbase shell (relational) operators running mapreduce jobs begins to impinge
on PIG territory.   We should be careful and avoid overlap/duplicating work.  Would it make
sense doing further operators as PIG user defined functions?  (I suppose we'll be able to
tell better after PIG-6, the hbase load/store, is done.  Smile).

Otherwise, patch looks good Edward.


> [Hbase Shell] Relational Algrebra Operators
> -------------------------------------------
>                 Key: HADOOP-1608
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1608
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.1
>         Environment: All environments 
>            Reporter: Edward Yoon
>            Priority: Minor
>             Fix For: 0.16.0
>         Attachments: patch_v01.txt, patch_v02.txt, patch_v03.txt, patch_v04.txt, patch_v05.txt,
patch_v06.txt, patch_v07.txt, patch_v08.txt, patch_v09.txt, patch_v10.txt, patch_v11.txt,
patch_v12.txt, patch_v13.txt, patch_v14.txt, patch_v15.txt
> Development of relational algebra operators has begun.
>  * Projection 
>   ** selects a subset of the columnfamilies of a relation  
>   ** Result = π ~column_list~ (Relation) 
> {code}
> Hbase > Result = Relation.Projection('year','legnth');
> Hbase > save Result into table('result');
> {code}
>  * Selection
>   ** See : HADOOP-2003 issue's description
> {code}
> Hbase > Result = Relation.Selection(length > 100 and studioName = 'Fox'); 
> Hbase > save Result into table('result');
> {code}
>  * Group
>   ** more details about 'GROUP' operation will be handled in HADOOP-1658 issue. 
>  * θ Join
>  ** The join of two relations R1(A ~1~,A ~2~,...,A ~n~) and R2(B ~1~,B ~2~,...,B ~m~)
is a relation with degree k=n+m and attributes (A ~1~,A ~2~,...,A ~n~, B ~1~,B ~2~,...,B ~m~)
that satisfy the join condition
> {code}
> Hbase > R1 = table('movieLog_table');
> Hbase > R2 = table('personInfo_table');
> Hbase > Result = R1.join(R1.producer: = R2.ROW) and R2; 
>      or Result = R1.join(R1.actor:hero = R2.Row) and R2;
>      or Result = R1.join(R1.actor:hero = R2.Row and R1.studioName = 'Fox' and R2.occupation
= 'singer') and R2;
> {code}

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message