pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitriy V. Ryaboy (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-1797) Problems when applying FOREACH ... GENERATE on data loaded from HBase
Date Sun, 13 Nov 2011 02:37:51 GMT

    [ https://issues.apache.org/jira/browse/PIG-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149209#comment-13149209
] 

Dmitriy V. Ryaboy commented on PIG-1797:
----------------------------------------

Eduardo, is still an issue for you with Pig 9 / 9.1 ?
                
> Problems when applying FOREACH ... GENERATE on data loaded from HBase
> ---------------------------------------------------------------------
>
>                 Key: PIG-1797
>                 URL: https://issues.apache.org/jira/browse/PIG-1797
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>         Environment: Our environment consists on  Hadoop 0.20.2, HBase 0.20.6, ZooKeeper
3.3.2 and Pig 0.8.0. They are configured to run as a pseudo-distributed system. 
>            Reporter: Eduardo Galán Herrero
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: pig-error2017.log.txt
>
>
> We defined a table at HBase and populated with some data:
> create 'tests', {NAME => 'age'}, {NAME => 'colour'}
> put 'tests', 'one', 'age', '22'
> put 'tests', 'one', 'colour', 'green'
> put 'tests', 'another', 'age', '439'
> put 'tests', 'another', 'colour', 'red'
> put 'tests', 'more', 'colour', 'grey'
> scan 'tests'                         
> ROW                          COLUMN+CELL                                            
                         
>  another                     column=age:, timestamp=1294745175613, value=439        
                         
>  another                     column=colour:, timestamp=1294745155873, value=red     
                         
>  more                        column=colour:, timestamp=1294745185331, value=grey    
                         
>  one                         column=age:, timestamp=1294745127129, value=22         
                         
>  one                         column=colour:, timestamp=1294745144160, value=green
> We are using Pig on mapreduce mode to load data from HBase (recovering also the row key):
> > DATA = LOAD 'hbase://tests' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('age:
colour:', '-loadKey') AS (row:chararray,age:int,colour:chararray);
> We make sure that data has been correcly loaded.
> > dump DATA;
> (another,439,red)
> (more,,grey)
> (one,22,green)
> > describe DATA;
> DATA: {row: chararray,age: int,colour: chararray}
> We can see that we can get good results if we use the "FOREACH .. GENERATE" structure
with all the columns ($0, $1 and $2) that were loaded before:
> > b= FOREACH DATA GENERATE $0, $1, $2;
> > dump b;
> (another,439,red)
> (more,,grey)
> (one,22,green)
> no matter the order...
> c= FOREACH DATA GENERATE $2, $0, $1;
> dump c;
> (red,another,439)
> (grey,more,)
> (green,one,22)
> but if we don't include some column (in our example, we don't use $2 column) in the "FOREACH
.. GENERATE" structure, then we get the following bug:
> > d= FOREACH DATA GENERATE $0, $1;
> > dump d;
> (another,)
> (more,)
> (one,)
> > describe d;                     
> d: {row: chararray,age: int}
> Here is another example of the bug:
> > e= FOREACH DATA GENERATE $1, $2;
> > dump e;
> (,439)
> (,)
> (,22)
> > describe e;
> e: {age: int,colour: chararray}
> Here is one more example of the bug:
> > f= FOREACH DATA GENERATE $0, $2;
> > dump f;
> (another,another)
> (more,more)
> (one,one)
> > describe f;
> f: {row: chararray,colour: chararray}
> Regards,
> Eduardo Galan Herrero

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message