Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 62F4910306 for ; Fri, 30 Aug 2013 23:27:52 +0000 (UTC) Received: (qmail 99914 invoked by uid 500); 30 Aug 2013 23:27:51 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 99864 invoked by uid 500); 30 Aug 2013 23:27:51 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 99855 invoked by uid 500); 30 Aug 2013 23:27:51 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 99852 invoked by uid 99); 30 Aug 2013 23:27:51 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Aug 2013 23:27:51 +0000 Date: Fri, 30 Aug 2013 23:27:51 +0000 (UTC) From: "Sushanth Sowmyan (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-4969) HCatalog HBaseHCatStorageHandler is not returning all the data MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755256#comment-13755256 ] Sushanth Sowmyan commented on HIVE-4969: ---------------------------------------- Hi, could you please attach a testcase that tests this as well? That way, the tests(including your test) fails without your fix, and succeeds with your fix. Also, as a general note, the HBaseHCatStorageHandler is about to be deprecated in favour of the hive's HBaseStorageHandler with HIVE-4331. > HCatalog HBaseHCatStorageHandler is not returning all the data > -------------------------------------------------------------- > > Key: HIVE-4969 > URL: https://issues.apache.org/jira/browse/HIVE-4969 > Project: Hive > Issue Type: Bug > Components: HCatalog > Affects Versions: 0.11.0 > Reporter: Venki Korukanti > Priority: Critical > Fix For: 0.11.1, 0.12.0 > > Attachments: HIVE-4969-1.patch > > > Repro steps: > 1) Create an HCatalog table mapped to HBase table. > hcat -e "CREATE TABLE studentHCat(rownum int, name string, age int, gpa float) > STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler' > TBLPROPERTIES('hbase.table.name' ='studentHBase', > 'hbase.columns.mapping' = > ':key,onecf:name,twocf:age,threecf:gpa')"; > 2) Load the following data from Pig. > cat student_data > 1^Asarah laertes^A23^A2.40 > 2^Atom allen^A72^A1.57 > 3^Abob ovid^A61^A2.67 > 4^Aethan nixon^A38^A2.15 > 5^Acalvin robinson^A28^A2.53 > 6^Airene ovid^A65^A2.56 > 7^Ayuri garcia^A36^A1.65 > 8^Acalvin nixon^A41^A1.04 > 9^Ajessica davidson^A48^A2.11 > 10^Akatie king^A39^A1.05 > grunt> A = LOAD 'student_data' AS (rownum:int,name:chararray,age:int,gpa:float); > grunt> STORE A INTO 'studentHCat' USING org.apache.hcatalog.pig.HCatStorer(); > 3) Now from HBase do a scan on the studentHBase table > hbase(main):026:0> scan 'studentPig', {LIMIT => 5} > 4) From pig access the data in table > grunt> A = LOAD 'studentHCat' USING org.apache.hcatalog.pig.HCatLoader(); > grunt> STORE A INTO '/user/root/studentPig'; > 5) Verify the output written in StudentPig > hadoop fs -cat /user/root/studentPig/part-r-00000 > 1 23 > 2 72 > 3 61 > 4 38 > 5 28 > 6 65 > 7 36 > 8 41 > 9 48 > 10 39 > The data returned has only two fields (rownum and age). > Problem: > While reading the data from HBase table, HbaseSnapshotRecordReader gets data row in Result (org.apache.hadoop.hbase.client.Result) object and processes the KeyValue fields in it. After processing, it creates another Result object out of the processed KeyValue array. Problem here is KeyValue array is not sorted. Result object expects the input KeyValue array to have sorted elements. When we call the Result.getValue() it returns no value for some of the fields as it does a binary search on un-ordered array. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira