hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thejas M Nair (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-4969) HCatalog HBaseHCatStorageHandler is not returning all the data
Date Mon, 07 Oct 2013 02:24:47 GMT

     [ https://issues.apache.org/jira/browse/HIVE-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Thejas M Nair updated HIVE-4969:
--------------------------------

    Fix Version/s:     (was: 0.11.1)
                       (was: 0.12.0)

Preparing for 0.12 release. Removing fix version of 0.12 for those that are not in 0.12 branch.


> HCatalog HBaseHCatStorageHandler is not returning all the data
> --------------------------------------------------------------
>
>                 Key: HIVE-4969
>                 URL: https://issues.apache.org/jira/browse/HIVE-4969
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>    Affects Versions: 0.11.0
>            Reporter: Venki Korukanti
>            Priority: Critical
>         Attachments: HIVE-4969-1.patch
>
>
> Repro steps:
> 1) Create an HCatalog table mapped to HBase table.
> hcat -e "CREATE TABLE studentHCat(rownum int, name string, age int, gpa float)
>          STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler'
>          TBLPROPERTIES('hbase.table.name' ='studentHBase',  
>                        'hbase.columns.mapping' =                
>                             ':key,onecf:name,twocf:age,threecf:gpa')";
> 2) Load the following data from Pig.
> cat student_data
> 1^Asarah laertes^A23^A2.40
> 2^Atom allen^A72^A1.57
> 3^Abob ovid^A61^A2.67
> 4^Aethan nixon^A38^A2.15
> 5^Acalvin robinson^A28^A2.53
> 6^Airene ovid^A65^A2.56
> 7^Ayuri garcia^A36^A1.65
> 8^Acalvin nixon^A41^A1.04
> 9^Ajessica davidson^A48^A2.11
> 10^Akatie king^A39^A1.05
> grunt> A = LOAD 'student_data' AS (rownum:int,name:chararray,age:int,gpa:float);
> grunt> STORE A INTO 'studentHCat' USING org.apache.hcatalog.pig.HCatStorer();
> 3) Now from HBase do a scan on the studentHBase table
> hbase(main):026:0> scan 'studentPig', {LIMIT => 5}
> 4) From pig access the data in table
> grunt> A = LOAD 'studentHCat' USING org.apache.hcatalog.pig.HCatLoader();
> grunt> STORE A INTO '/user/root/studentPig';
> 5) Verify the output written in StudentPig
> hadoop fs -cat /user/root/studentPig/part-r-00000
> 1              23
> 2              72
> 3              61
> 4              38
> 5              28
> 6              65
> 7              36
> 8              41
> 9              48
> 10             39
> The data returned has only two fields (rownum and age).
> Problem:
> While reading the data from HBase table, HbaseSnapshotRecordReader gets data row in Result
(org.apache.hadoop.hbase.client.Result) object and processes the KeyValue fields in it. After
processing, it creates another Result object out of the processed KeyValue array. Problem
here is KeyValue array is not sorted. Result object expects the input KeyValue array to have
sorted elements. When we call the Result.getValue() it returns no value for some of the fields
as it does a binary search on un-ordered array.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message