Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 97C6CDB42 for ; Wed, 29 May 2013 17:04:24 +0000 (UTC) Received: (qmail 93627 invoked by uid 500); 29 May 2013 17:04:23 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 93442 invoked by uid 500); 29 May 2013 17:04:21 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 93384 invoked by uid 500); 29 May 2013 17:04:20 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 93374 invoked by uid 99); 29 May 2013 17:04:20 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 May 2013 17:04:20 +0000 Date: Wed, 29 May 2013 17:04:20 +0000 (UTC) From: "Nick Dimiduk (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-4627) Total ordering of Hive output MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-4627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-4627: ------------------------------- Attachment: 02_hfiles.hql 01_sample.hql 00_tables.ddl These are my steps to reproduce: {noformat} ## load the input data $ wget http://dumps.wikimedia.org/other/pagecounts-raw/2008/2008-10/pagecounts-20081001-000000.gz $ hadoop fs -mkdir /tmp/wikistats $ hadoop fs -put pagecounts-20081001-000000.gz /tmp/wikistats/ ## create the necessary tables. $ hcat -f /tmp/00_tables.ddl OK Time taken: 1.886 seconds OK Time taken: 0.654 seconds OK Time taken: 0.047 seconds OK Time taken: 0.115 seconds ## verify $ hive -e "select * from pagecounts limit 10;" ... OK aa Main_Page 4 41431 aa Special:ListUsers 1 5555 aa Special:Listusers 1 1052 ... $ hive -e "select * from pgc limit 10;" ... OK aa/Main_Page/20081001-000000 4 41431 aa/Special:ListUsers/20081001-000000 1 5555 aa/Special:Listusers/20081001-000000 1 1052 ... ## produce the hfile splits file $ hive -f /tmp/01_sample.hql ... OK Time taken: 54.681 seconds [hrt_qa] $ hadoop fs -ls /tmp/hbase_splits Found 1 items -rwx------ 3 hrt_qa hdfs 270 2013-05-17 19:05 /tmp/hbase_splits ## verify $ hadoop jar /usr/lib/hadoop/contrib/streaming/hadoop-streaming-1.2.0.1.3.0.0-104.jar -libjars /usr/lib/hive/lib/hive-exec-0.11.0.1.3.0.0-104.jar -input /tmp/hbase_splits -output /tmp/hbase_splits_txt -inputformat SequenceFileAsTextInputFormat ... 13/05/17 19:08:38 INFO streaming.StreamJob: Output: /tmp/hbase_splits_txt $ hadoop fs -cat /tmp/hbase_splits_txt/* 01 61 66 2e 71 2f 4d 61 69 6e 5f 50 61 67 65 2f 32 30 30 38 31 30 30 31 2d 30 30 30 30 30 30 00 (null) 01 61 66 2f 31 35 35 30 2f 32 30 30 38 31 30 30 31 2d 30 30 30 30 30 30 00 (null) 01 61 66 2f 32 38 5f 4d 61 61 72 74 2f 32 30 30 38 31 30 30 31 2d 30 30 30 30 30 30 00 (null) 01 61 66 2f 42 65 65 6c 64 3a 31 30 30 5f 31 38 33 30 2e 4a 50 47 2f 32 30 30 38 31 30 30 31 2d 30 30 30 30 30 30 00 (null) ## decoding the first line from utf8 bytes to String yields "af.q/Main_Page/20081001-000000," which is correct ## generate the hfiles $ HADOOP_CLASSPATH=/usr/lib/hbase/hbase-0.94.6.1.3.0.0-104-security.jar hive -f /tmp/02_hfiles.hql {noformat} > Total ordering of Hive output > ----------------------------- > > Key: HIVE-4627 > URL: https://issues.apache.org/jira/browse/HIVE-4627 > Project: Hive > Issue Type: Bug > Affects Versions: 0.11.0 > Reporter: Nick Dimiduk > Attachments: 00_tables.ddl, 01_sample.hql, 02_hfiles.hql, hive-partitioner.patch > > > I'd like to use Hive to generate HFiles for HBase. I started off by following the instructions on the [wiki|https://cwiki.apache.org/Hive/hbasebulkload.html], but that took me only so far. TotalOrderPartitioning didn't work. That took me to this [post|http://stackoverflow.com/questions/13715044/hive-cluster-by-vs-order-by-vs-sort-by] which points out that Hive partitions on value instead of key. A patched TOP brings me to this error: > {noformat} > 2013-05-17 21:00:47,781 WARN org.apache.hadoop.mapred.Child: Error running child > java.lang.RuntimeException: Hive Runtime Error while closing operators: java.io.IOException: No files found in hdfs://ip-10-191-3-134.ec2.internal:8020/tmp/hive-hrt_qa/hive_2013-05-17_20-58-58_357_6896546413926013201/_task_tmp.-ext-10000/_tmp.000000_0 > at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:317) > at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:532) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: No files found in hdfs://ip-10-191-3-134.ec2.internal:8020/tmp/hive-hrt_qa/hive_2013-05-17_20-58-58_357_6896546413926013201/_task_tmp.-ext-10000/_tmp.000000_0 > at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:183) > at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:865) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597) > at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:309) > ... 7 more > Caused by: java.io.IOException: No files found in hdfs://ip-10-191-3-134.ec2.internal:8020/tmp/hive-hrt_qa/hive_2013-05-17_20-58-58_357_6896546413926013201/_task_tmp.-ext-10000/_tmp.000000_0 > at org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$1.close(HiveHFileOutputFormat.java:142) > at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:180) > ... 11 more > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira