hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-6) Addition of Hbase Storage Option In Load/Store Statement
Date Wed, 19 Nov 2008 01:37:44 GMT

    [ https://issues.apache.org/jira/browse/PIG-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648874#action_12648874

Alan Gates commented on PIG-6:

A couple of rudimentary comments.  My knowledge of hbase is limited, so please feel free to
correct assumptions I have about hbase or point me to appropriate documentation.

1.  I'd like to avoid specialized syntax for hbase type queries.  Why do we need a special
load and store syntax?  Is it not possible to fit the necessary information into a combination
of the loader constructor arguments and the filename string.  Roughly like:  A = load 'hbase
query' using HBaseLoader("Hbase connection info");

2. I like to avoid adding new operators to the logical plans, adding new DataStorage implementations,
etc.  I agree that a new slicer and loader will be needed.  I was thinking that the loader
and slicer could handle turning the results of the hbase query into records that could be
passed to the rest of the pig pipeline as is, and the inverse for storage functions.  Past
that, why does anything else in pig need to understand hbase?  Am I glossing over import details?

> Addition of Hbase Storage Option In Load/Store Statement
> --------------------------------------------------------
>                 Key: PIG-6
>                 URL: https://issues.apache.org/jira/browse/PIG-6
>             Project: Pig
>          Issue Type: New Feature
>         Environment: all environments
>            Reporter: Edward J. Yoon
> It needs to be able to load full table in hbase.  (maybe ... difficult? i'm not sure
> Also, as described below, 
> It needs to compose an abstract 2d-table only with certain data filtered from hbase array
structure using arbitrary query-delimited. 
> {code}
> A = LOAD table('hbase_table');
> or
> B = LOAD table('hbase_table') Using HbaseQuery('Query-delimited by attributes & timestamp')
as (f1, f2[, f3]);
> {code}
> Once test is done on my local machines, 
> I will clarify the grammars and give you more examples to help you explain more storage
> Any advice welcome.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message