hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitriy V. Ryaboy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-759) HBaseStorage scheme for Load/Slice function
Date Fri, 14 May 2010 16:52:44 GMT

    [ https://issues.apache.org/jira/browse/PIG-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867558#action_12867558
] 

Dmitriy V. Ryaboy commented on PIG-759:
---------------------------------------

Alex,
Check out the modified Hbase loader I have in ElephantBird that does this and more:
http://github.com/kevinweil/elephant-bird/tree/master/src/java/com/twitter/elephantbird/pig/load/

(you want HBaseLoader and HBaseSlice).

It works with 0.6; a major backwards-incompatible change is that it doesn't expect the HBase
table to contain string represenations of everything, and tries to work on the byte level
instead.  

Porting this into 0.7 might involve allowing the Caster interface to be user-specified, in
which case it would be trivial to use the old String approach or the new Binary approach.
 Feel free to take on porting this to 0.7, I probably won't get to that for at least a month.

We are completely open to putting this back into Pig, the only reason it's in EB is that 0.6
was frozen when this was created, and we don't yet run 0.7 at Twitter :).

-D

> HBaseStorage scheme for Load/Slice function
> -------------------------------------------
>
>                 Key: PIG-759
>                 URL: https://issues.apache.org/jira/browse/PIG-759
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Gunther Hagleitner
>             Fix For: 0.7.0
>
>         Attachments: patch.p1
>
>
> We would like to change the HBaseStorage function to use a scheme when loading a table
in pig. The scheme we are thinking of is: "hbase". So in order to load an hbase table in a
pig script the statement should read:
> {noformat}
> table = load 'hbase://<tablename>' using HBaseStorage();
> {noformat}
> If the scheme is omitted pig would assume the tablename to be an hdfs path and the storage
function would use the last component of the path as a table name and output a warning.
> For details on why see jira issue: PIG-758

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message