pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Welch (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-3961) Adding HBaseStorage cell value filters
Date Sun, 25 May 2014 23:01:02 GMT

     [ https://issues.apache.org/jira/browse/PIG-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mike Welch updated PIG-3961:
----------------------------

    Description: 
Adding three additional server side filtering options when loading data with HBaseStorage:

# specified cf:col does not exist
{{-null cf:col}}
# specified cf:col must exist
{{-notnull cf:col}}
# specified cf:col contains the given value
{{-val cf:col=value}}

These are meant to replace (and optimize by reducing data transfer) the frequent paradigm
in pig of loading data and immediately filtering for a specific condition.  For example

data = load 'hbase://mytable' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:*')
as (cf:map[]) ;
data_with_value = filter data by cf#'col' = 'value' ;

Can be replaced with:

data_with_value = load 'hbase://mytable' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:*',
'-val cf:col=value') as (cf:map[]) ;


  was:
Adding three additional server side filtering options when loading data with HBaseStorage:

# specified cf:col does not exist
{{-null cf:col}}
# specified cf:col must exist
{{-notnull cf:col}}
# specified cf:col contains the given value
{{-val cf:col=value}}

These are meant to replace (and optimize by reducing data transfer) the frequent paradigm
in pig of loading data and immediately filtering for a specific condition.  For example

data = load 'hbase://mytable' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:*')
as (cf:map[]) ;
data_with_value = filter data by cf#'col' = 'value' ;

Can be replaced with:

data_with_value = load 'hbase://mytable' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:*',
'cf:col=value') as (cf:map[]) ;



> Adding HBaseStorage cell value filters
> --------------------------------------
>
>                 Key: PIG-3961
>                 URL: https://issues.apache.org/jira/browse/PIG-3961
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Mike Welch
>            Assignee: Mike Welch
>            Priority: Minor
>             Fix For: 0.14.0
>
>         Attachments: filters-patch.diff
>
>
> Adding three additional server side filtering options when loading data with HBaseStorage:
> # specified cf:col does not exist
> {{-null cf:col}}
> # specified cf:col must exist
> {{-notnull cf:col}}
> # specified cf:col contains the given value
> {{-val cf:col=value}}
> These are meant to replace (and optimize by reducing data transfer) the frequent paradigm
in pig of loading data and immediately filtering for a specific condition.  For example
> data = load 'hbase://mytable' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:*')
as (cf:map[]) ;
> data_with_value = filter data by cf#'col' = 'value' ;
> Can be replaced with:
> data_with_value = load 'hbase://mytable' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:*',
'-val cf:col=value') as (cf:map[]) ;



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message