pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitriy V. Ryaboy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2193) Problem with HBase loader 0.90.3 and PIG 0.8.1
Date Mon, 01 Aug 2011 21:54:48 GMT

    [ https://issues.apache.org/jira/browse/PIG-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073808#comment-13073808
] 

Dmitriy V. Ryaboy commented on PIG-2193:
----------------------------------------

I will review.

> Problem with HBase loader 0.90.3 and PIG 0.8.1
> ----------------------------------------------
>
>                 Key: PIG-2193
>                 URL: https://issues.apache.org/jira/browse/PIG-2193
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>         Environment: HBase 0.90.3, Hadoop 0.20-append
>            Reporter: Vincent BARAT
>            Assignee: Raghu Angadi
>             Fix For: 0.9.1, 0.10
>
>         Attachments: PIG-2193.patch, PIG-2193.patch
>
>
> I've some data in HBase 0.90.3 and I run a simple script on them.
> This script badly returns 0 records. From time to time, under yet undefined conditions,
the same script on the same data works (it return correct data).
> When data are loaded from HDFS instead of HBase, the script runs perfectly.
> Here is the script loading from HDFS (works): 
> start_sessions = LOAD 'start_sessions' AS (sid:chararray, infoid:chararray, imei:chararray,
start:long);
> end_sessions = LOAD 'end_sessions' AS (sid:chararray, end:long, locid:chararray);
> infos = LOAD 'infos' AS (infoid:chararray, network_type:chararray, network_subtype:chararray,
locale:chararray, version_name:chararray, carrier_country:chararray, carrier_name:chararray,
phone_manufacturer:chararray, phone_model:chararray, firmware_version:chararray, firmware_name:chararray);
> sessions = JOIN start_sessions BY sid, end_sessions BY sid;
> sessions = FILTER sessions BY end > start AND end - start < 86400000L;
> sessions = JOIN sessions BY infoid, infos BY infoid;
> sessions = LIMIT sessions 100;
> dump sessions;
> The same script loading from HBase (don't work):
> start_sessions = LOAD 'startSession' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
meta:infoid meta:imei meta:timestamp') AS (sid:chararray, infoid:chararray, imei:chararray,
start:long);
> end_sessions = LOAD 'endSession' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
meta:timestamp meta:locid') AS (sid:chararray, end:long, locid:chararray);
> infos = LOAD 'info' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:infoid
data:networkType data:networkSubtype data:locale data:applicationVersionName data:carrierCountry
data:carrierName data:phoneManufacturer data:phoneModel data:firmwareVersion data:firmwareName')
AS (infoid:chararray, network_type:chararray, network_subtype:chararray, locale:chararray,
version_name:chararray, carrier_country:chararray, carrier_name:chararray, phone_manufacturer:chararray,
phone_model:chararray, firmware_version:chararray, firmware_name:chararray);
> sessions = JOIN start_sessions BY sid, end_sessions BY sid;
> sessions = FILTER sessions BY end > start AND end - start < 86400000L;
> sessions = JOIN sessions BY infoid, infos BY infoid;
> sessions = LIMIT sessions 100;
> dump sessions;
> I guess it definitively means there is a nasty bug in the HBase loader.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message