Return-Path: Delivered-To: apmail-hadoop-hive-user-archive@minotaur.apache.org Received: (qmail 8553 invoked from network); 16 Jul 2010 16:18:09 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 16 Jul 2010 16:18:09 -0000 Received: (qmail 14749 invoked by uid 500); 16 Jul 2010 16:18:09 -0000 Delivered-To: apmail-hadoop-hive-user-archive@hadoop.apache.org Received: (qmail 14700 invoked by uid 500); 16 Jul 2010 16:18:08 -0000 Mailing-List: contact hive-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-user@hadoop.apache.org Delivered-To: mailing list hive-user@hadoop.apache.org Received: (qmail 14692 invoked by uid 99); 16 Jul 2010 16:18:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Jul 2010 16:18:08 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of swatt@us.ibm.com designates 32.97.182.145 as permitted sender) Received: from [32.97.182.145] (HELO e5.ny.us.ibm.com) (32.97.182.145) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Jul 2010 16:17:59 +0000 Received: from d01relay01.pok.ibm.com (d01relay01.pok.ibm.com [9.56.227.233]) by e5.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id o6GFxXYr025546 for ; Fri, 16 Jul 2010 11:59:33 -0400 Received: from d03av05.boulder.ibm.com (d03av05.boulder.ibm.com [9.17.195.85]) by d01relay01.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o6GGHNJI128266 for ; Fri, 16 Jul 2010 12:17:23 -0400 Received: from d03av05.boulder.ibm.com (loopback [127.0.0.1]) by d03av05.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id o6GGHEKD012861 for ; Fri, 16 Jul 2010 10:17:14 -0600 Received: from d03nm123.boulder.ibm.com (d03nm123.boulder.ibm.com [9.17.195.149]) by d03av05.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id o6GGHDWR012840 for ; Fri, 16 Jul 2010 10:17:13 -0600 To: hive-user@hadoop.apache.org MIME-Version: 1.0 Subject: "Select count(1) from Table" Failing with class cast exception X-KeepSent: 7100C794:0A1D3139-86257762:0056BC6E; type=4; name=$KeepSent X-Mailer: Lotus Notes Release 8.5 December 05, 2008 Message-ID: From: Stephen Watt Date: Fri, 16 Jul 2010 11:17:11 -0500 X-MIMETrack: Serialize by Router on D03NM123/03/M/IBM(Release 8.5.1HF41 | October 22, 2009) at 07/16/2010 10:17:13, Serialize complete at 07/16/2010 10:17:13 Content-Type: multipart/alternative; boundary="=_alternative 0059774486257762_=" X-Virus-Checked: Checked by ClamAV on apache.org This is a multipart message in MIME format. --=_alternative 0059774486257762_= Content-Type: text/plain; charset="US-ASCII" Hi Folks This issue occurs on Hive 0.4 and 0.5. I wanted to wait on opening a JIRA ticket until I ran it by the community first. I'm testing Hive 0.5 running on Apache Hadoop 0.20.2 which is using IBM Java 6 (32 bit x86 Java SR8 : which can be obtained here - https://www.ibm.com/developerworks/java/jdk/linux/download.html) To recreate this I'm using the pokes table loaded with data from the examples directory, per the tutorial and I run the following in the Hive CLI (bin/hive) : select count(1) from pokes; This works just fine on Sun/Oracle Java 6, but when I change the Hadoop-env to point to IBM Java 6 it fails in the Map with the following exception : Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector incompatible with org.apache.hadoop.hive.serde2.objectinspector.primitive.LongObjectInspector at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount$GenericUDAFCountEvaluator.merge(GenericUDAFCount.java:104) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:113) at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:451) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:591) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:500) ... 14 more Note, the line number in GenericUDAFCount here is off by 4 based on a couple of LOG.info calls I added for debugging purposes. The net of it is that it is failing when it attempts to do the following cast in the merge method: (LongObjectInspector)inputOI This is where it gets weird. In SUN Java, this method gets called in the Reducer. In IBM Java, it gets called in the Mapper. If I use EXPLAIN in the Hive CLI, the execution plans are identical regardless of which JRE is being used in Hadoop. In SUN Java, the type for inputOI is a BigInt which is being derived off of a single column schema called _col0_ in the reducer (likely the output tuple of the count result) and casts to a Long with no problem. In IBM Java, this call is happening in the Map and inputOI is being derived off of what appears to be the first column of the Spokes table schema, which is an int and is therefore failing when being cats to a Long. It appears the cast is merely symptomatic of a difference in the execution plans. Debugging from this point, really requires someone who understands HIVE execution plans better than I do. Is there anyone that can help with this issue? This is really easy to replicate. Download the IBM JDK, mod your hadoop env to point to the extracted dir of the IBM JDK and do a select count from any table. Regards Steve Watt --=_alternative 0059774486257762_= Content-Type: text/html; charset="US-ASCII" Hi Folks

This issue occurs on Hive 0.4 and 0.5. I wanted to wait on opening a JIRA ticket until I ran it by the community first.

I'm testing Hive 0.5 running on Apache Hadoop 0.20.2 which is using IBM Java 6  (32 bit x86 Java SR8 : which can be obtained here - https://www.ibm.com/developerworks/java/jdk/linux/download.html)

To recreate this I'm using the pokes table loaded with data from the examples directory, per the tutorial and I run the following in the Hive CLI (bin/hive) : select count(1) from pokes;

This works just fine on Sun/Oracle Java 6, but when I change the Hadoop-env to point to IBM Java 6 it fails in the Map with the following exception :

Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector incompatible with org.apache.hadoop.hive.serde2.objectinspector.primitive.LongObjectInspector
        at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount$GenericUDAFCountEvaluator.merge(GenericUDAFCount.java:104)
        at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:113)
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:451)
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:591)
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:500)
        ... 14 more

Note, the line number in GenericUDAFCount here is off by 4 based on a couple of LOG.info calls I added for debugging purposes.  The net of it is that it is failing when it attempts to do the following cast in the merge method:
(LongObjectInspector)inputOI

This is where it gets weird. In SUN Java, this method gets called in the Reducer. In IBM Java, it gets called in the Mapper. If I use EXPLAIN in the Hive CLI, the execution plans are identical regardless of which JRE is being used in Hadoop. In SUN Java, the type for inputOI is a BigInt which is being derived off of a single column schema called _col0_ in the reducer (likely the output tuple of the count result) and casts to a Long with no problem. In IBM Java, this call is happening in the Map and inputOI is being derived off of what appears to be the first column of the Spokes table schema, which is an int and is therefore failing when being cats to a Long. It appears the cast is merely symptomatic of a difference in the execution plans.

Debugging from this point, really requires someone who understands HIVE execution plans better than I do. Is there anyone that can help with this issue? This is really easy to replicate. Download the IBM JDK, mod your hadoop env to point to the  extracted dir of the IBM JDK and do a select count from any table.

Regards
Steve Watt
--=_alternative 0059774486257762_=--