Return-Path: Delivered-To: apmail-hadoop-hive-dev-archive@minotaur.apache.org Received: (qmail 39643 invoked from network); 11 Mar 2009 00:27:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Mar 2009 00:27:14 -0000 Received: (qmail 72953 invoked by uid 500); 11 Mar 2009 00:27:13 -0000 Delivered-To: apmail-hadoop-hive-dev-archive@hadoop.apache.org Received: (qmail 72896 invoked by uid 500); 11 Mar 2009 00:27:13 -0000 Mailing-List: contact hive-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-dev@hadoop.apache.org Delivered-To: mailing list hive-dev@hadoop.apache.org Received: (qmail 72197 invoked by uid 99); 11 Mar 2009 00:27:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Mar 2009 17:27:11 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Mar 2009 00:27:10 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 78173234C003 for ; Tue, 10 Mar 2009 17:26:50 -0700 (PDT) Message-ID: <1468281321.1236731210477.JavaMail.jira@brutus> Date: Tue, 10 Mar 2009 17:26:50 -0700 (PDT) From: "Ashish Thusoo (JIRA)" To: hive-dev@hadoop.apache.org Subject: [jira] Updated: (HIVE-233) Inserting into a table from a map/reduce transform results in no data In-Reply-To: <1720140370.1231963139608.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HIVE-233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo updated HIVE-233: ------------------------------- Priority: Blocker (was: Major) Affects Version/s: 0.3.0 Marking this as a blocker for 0.3 as we need to understand this better and see if there is a workaround. > Inserting into a table from a map/reduce transform results in no data > --------------------------------------------------------------------- > > Key: HIVE-233 > URL: https://issues.apache.org/jira/browse/HIVE-233 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.3.0 > Reporter: Josh Ferguson > Priority: Blocker > > When attempting to run a query of the form > INSERT OVERWRITE TABLE table_name PARTITION ( ... ) SELECT TRANSFORM ... > The table 'table_name' ends up empty even when the SELECT statement returns valid results that can be loaded by hand. > *My target table* > CREATE TABLE percentiles > (actor_id STRING, percentile INT, count INT) > PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT, span INT) > CLUSTERED BY (actor_id) INTO 32 BUCKETS > ROW FORMAT DELIMITED > COLLECTION ITEMS TERMINATED BY '44' > MAP KEYS TERMINATED BY '58' > STORED AS TEXTFILE; > *The attempted query* > INSERT OVERWRITE TABLE percentiles PARTITION ( account='cUU5T7y6DmdzMJFcFt3JDe', application='test', dataset='test', hour=341976, span=168 ) SELECT TRANSFORM(actor_id) USING '/Users/Josh/cluster/bin/percentiles.rb' AS (actor_id, percentile, count) FROM ( SELECT actor_id FROM activities CLUSTER BY actor_id ) actors; > *The result in hadoop (which is 0 bytes)* > $ hadoop fs -ls /user/hive/warehouse/percentiles/*/*/*/*/* > -rw-r--r-- 1 Josh supergroup 0 2009-01-14 11:19 /user/hive/warehouse/percentiles/account=cUU5T7y6DmdzMJFcFt3JDe/application=test/dataset=test/hour=341976/span=168/attempt_200901131908_0009_r_000000_0 > *Inserting into a temporary directory first* > hive> INSERT OVERWRITE DIRECTORY 'hdfs://localhost:9000/tmp/hdfs_out' SELECT TRANSFORM(actor_id) USING '/Users/Josh/cluster/bin/percentiles.rb' AS (actor_id, percentile, count) FROM ( SELECT actor_id FROM activities CLUSTER BY actor_id ) actors; > *The results in hadoop (8600 bytes)* > $ hadoop fs -ls /tmp/hdfs_out > Found 1 items > -rw-r--r-- 1 Josh supergroup 8600 2009-01-14 11:27 /tmp/hdfs_out/attempt_200901131908_0010_r_000000_0 > *Loading from the temporary directory into percentiles* > hive> LOAD DATA INPATH 'hdfs://localhost:9000/tmp/hdfs_out' INTO TABLE percentiles PARTITION ( account='cUU5T7y6DmdzMJFcFt3JDe', application='test', dataset='test', hour=341976, span=168 ); > Loading data to table percentiles partition {account=cUU5T7y6DmdzMJFcFt3JDe, application=test, dataset=test, hour=341976, span=168} > OK > *The results in hadoop (8600 bytes)* > $ hadoop fs -ls /user/hive/warehouse/percentiles/*/*/*/*/* > -rw-r--r-- 1 Josh supergroup 8600 2009-01-14 11:27 /user/hive/warehouse/percentiles/account=cUU5T7y6DmdzMJFcFt3JDe/application=test/dataset=test/hour=341976/span=168/attempt_200901131908_0010_r_000000_0 > So it works fine when loading via a LOAD DATA statement from a temporary directory but not in a query with an INSERT OVERWRITE statement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.