pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Graham (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1782) Add ability to load data by column family in HBaseStorage
Date Thu, 24 Feb 2011 23:20:39 GMT

     [ https://issues.apache.org/jira/browse/PIG-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Bill Graham updated PIG-1782:
-----------------------------

    Attachment: PIG_1782_2.patch

Attached is a second patch. This one is built to be applied on top of the PIG_1680.3.patch.

>From the Javadocs:

An HBase implementation of LoadFunc and StoreFunc.

Below is an example showing how to load data from HBase:

{code}
raw = LOAD 'hbase://SampleTable'
      USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'info:first_name info:last_name friends:* info:*', '-loadKey true -limit 5')
       AS (id:bytearray, first_name:chararray, last_name:chararray, friends_map:map[], info_map:map[]);
{code}
This example loads data redundantly from the info column family just to illustrate usage.
Note that the row key is inserted first in the result schema. To load only column names that
start with a given prefix, specify the column prefix with a trailing \*. For example passing
{{friends:bob_*}} to the constructor in the above example would cause only columns that start
with _bob__ to be loaded.

Below is an example showing how to store data into HBase:
{code}
 copy = STORE raw INTO 'hbase://SampleTableCopy'
       USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
       'info:first_name info:last_name friends:* info:*')
       AS (info:first_name info:last_name buddies:* info:*);
{code}
Note that {{STORE}} will expect the first value in the tuple to be the row key. Scalar values
need to map to an explicit column descriptor and maps need to map to a column family name.
In the above examples, the {{friends}} column family data from {{SampleTable}} will be written
to a {{buddies}} column family in the {{SampleTableCopy}} table.
 

> Add ability to load data by column family in HBaseStorage
> ---------------------------------------------------------
>
>                 Key: PIG-1782
>                 URL: https://issues.apache.org/jira/browse/PIG-1782
>             Project: Pig
>          Issue Type: New Feature
>         Environment: Java 6, Mac OS X 10.6
>            Reporter: Eric Yang
>            Assignee: Bill Graham
>         Attachments: PIG-1782_1.patch, PIG_1782_2.patch, apply-PIG-1782-patch.sh
>
>
> It would be nice to load all columns in the column family by using short hand syntax
like:
> {noformat}
> CpuMetrics = load 'hbase://SystemMetrics' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cpu:','-loadKey');
> {noformat}
> Assuming there are columns cpu: sys.0, cpu:sys.1, cpu:user.0, cpu:user.1,  in cpu column
family.
> CpuMetrics would contain something like:
> {noformat}
> (rowKey, cpu:sys.0, cpu:sys.1, cpu:user.0, cpu:user.1)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message