hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venki Korukanti (JIRA)" <>
Subject [jira] [Created] (HIVE-4969) HCatalog HBaseHCatStorageHandler is not returning all the data
Date Wed, 31 Jul 2013 22:49:49 GMT
Venki Korukanti created HIVE-4969:

             Summary: HCatalog HBaseHCatStorageHandler is not returning all the data
                 Key: HIVE-4969
             Project: Hive
          Issue Type: Bug
          Components: HCatalog
    Affects Versions: 0.11.0
            Reporter: Venki Korukanti
            Priority: Critical

Repro steps:
1) Create an HCatalog table mapped to HBase table.

hcat -e "CREATE TABLE studentHCat(rownum int, name string, age int, gpa float)
         STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler'
         TBLPROPERTIES('' ='studentHBase',  
                       'hbase.columns.mapping' =                

2) Load the following data from Pig.

cat student_data
1^Asarah laertes^A23^A2.40
2^Atom allen^A72^A1.57
3^Abob ovid^A61^A2.67
4^Aethan nixon^A38^A2.15
5^Acalvin robinson^A28^A2.53
6^Airene ovid^A65^A2.56
7^Ayuri garcia^A36^A1.65
8^Acalvin nixon^A41^A1.04
9^Ajessica davidson^A48^A2.11
10^Akatie king^A39^A1.05

grunt> A = LOAD 'student_data' AS (rownum:int,name:chararray,age:int,gpa:float);

grunt> STORE A INTO 'studentHCat' USING org.apache.hcatalog.pig.HCatStorer();

3) Now from HBase do a scan on the studentHBase table
hbase(main):026:0> scan 'studentPig', {LIMIT => 5}

4) From pig access the data in table
grunt> A = LOAD 'studentHCat' USING org.apache.hcatalog.pig.HCatLoader();
grunt> STORE A INTO '/user/root/studentPig';

5) Verify the output written in StudentPig
hadoop fs -cat /user/root/studentPig/part-r-00000
1              23
2              72
3              61
4              38
5              28
6              65
7              36
8              41
9              48
10             39

The data returned only two fields (rownum and age).

While reading the data from HBase table, HbaseSnapshotRecordReader gets data row in Result
(org.apache.hadoop.hbase.client.Result) object and processes the KeyValue fields in it. After
processing it creates another Result object out of the processed KeyValue array. Problem here
is KeyValue array is not sorted. Result object expects the input KeyValue array to have sorted
elements. When we call the Result.getValue() it returns no value for some of the fields as
it does a binary search on unordered array.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message