pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Graham (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-3108) HBaseStorage returns empty maps when mixing wildcard- with other columns
Date Mon, 07 Jan 2013 05:18:13 GMT

    [ https://issues.apache.org/jira/browse/PIG-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13545642#comment-13545642
] 

Bill Graham commented on PIG-3108:
----------------------------------

Got it, thanks for the clarification. A few comments then:

1. Yes, we should rename {{addFiltersWithoutColumnPrefix}} to {{addScans}}, since the filter
part is misleading and it seems we can always call that method as implemented.
2. Instead of calling {{addFiltersWithoutColumnPrefix}} from {{setLocation}} let's just remove
the  addFamily/addColumn block from {{setLocation}}. Then in {{initScan()}} we can handle
both scans and filters in one place with something like this (to replace the existing conditional):

{noformat}
addScans(columnInfo_);

if (!columnPrefixExists) {
  addFiltersWithoutColumnPrefix(columnInfo_);
}
{noformat}

3. Would you please update the javadocs in {{addFiltersWithoutColumnPrefix}} (or {{addScan}})
to describe the new logic as best you can. This section of the filter/scan code has been particularly
nasty in the past so we should be as clear as possible about what's happening here.


                
> HBaseStorage returns empty maps when mixing wildcard- with other columns
> ------------------------------------------------------------------------
>
>                 Key: PIG-3108
>                 URL: https://issues.apache.org/jira/browse/PIG-3108
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.0, 0.9.1, 0.9.2, 0.10.0, 0.11, 0.10.1, 0.12
>            Reporter: Christoph Bauer
>             Fix For: 0.12
>
>         Attachments: PIG-3108.patch
>
>
> Consider the following:
> A and B should be the same (with different order, of course).
> {code}
> /*
> in hbase shell:
> create 'pigtest', 'pig'
> put 'pigtest' , '1', 'pig:name', 'A'
> put 'pigtest' , '1', 'pig:has_legs', 'true'
> put 'pigtest' , '1', 'pig:has_ribs', 'true'
> */
> A = LOAD 'hbase://pigtest' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('pig:name
pig:has*') AS (name:chararray,parts);
> B = LOAD 'hbase://pigtest' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('pig:has*
pig:name') AS (parts,name:chararray);
> dump A;
> dump B;
> {code}
> This is due to a bug in setLocation and initScan.
> For _A_ 
> # scan.addColumn(pig,name); // for 'pig:name'
> # scan.addFamily(pig); // for the 'pig:has*'
> So that's silently right.
> But for _B_
> # scan.addFamily(pig)
> # scan.addColumn(pig,name)
> will override the first call to addFamily, because you cannot mix them on the same family.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message