Mailing-List: contact hive-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hive-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of swatt@us.ibm.com designates
 32.97.182.145 as permitted sender)
To: hive-user@hadoop.apache.org
MIME-Version: 1.0
Subject: "Select count(1) from Table" Failing with class cast exception
Message-ID: 
 <OF7100C794.0A1D3139-ON86257762.0056BC6E-86257762.00597746@us.ibm.com>
From: Stephen Watt <swatt@us.ibm.com>
Date: Fri, 16 Jul 2010 11:17:11 -0500
Content-Type: multipart/alternative;
 boundary="=_alternative 0059774486257762_="

This is a multipart message in MIME format.
--=_alternative 0059774486257762_=
Content-Type: text/plain; charset="US-ASCII"

Hi Folks

This issue occurs on Hive 0.4 and 0.5. I wanted to wait on opening a JIRA 
ticket until I ran it by the community first.

I'm testing Hive 0.5 running on Apache Hadoop 0.20.2 which is using IBM 
Java 6  (32 bit x86 Java SR8 : which can be obtained here - 
https://www.ibm.com/developerworks/java/jdk/linux/download.html) 

To recreate this I'm using the pokes table loaded with data from the 
examples directory, per the tutorial and I run the following in the Hive 
CLI (bin/hive) : select count(1) from pokes; 

This works just fine on Sun/Oracle Java 6, but when I change the 
Hadoop-env to point to IBM Java 6 it fails in the Map with the following 
exception :

Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector 
incompatible with 
org.apache.hadoop.hive.serde2.objectinspector.primitive.LongObjectInspector
        at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount$GenericUDAFCountEvaluator.merge(GenericUDAFCount.java:104)
        at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:113)
        at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:451)
        at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:591)
        at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:500)
        ... 14 more

Note, the line number in GenericUDAFCount here is off by 4 based on a 
couple of LOG.info calls I added for debugging purposes.  The net of it is 
that it is failing when it attempts to do the following cast in the merge 
method:
(LongObjectInspector)inputOI

This is where it gets weird. In SUN Java, this method gets called in the 
Reducer. In IBM Java, it gets called in the Mapper. If I use EXPLAIN in 
the Hive CLI, the execution plans are identical regardless of which JRE is 
being used in Hadoop. In SUN Java, the type for inputOI is a BigInt which 
is being derived off of a single column schema called _col0_ in the 
reducer (likely the output tuple of the count result) and casts to a Long 
with no problem. In IBM Java, this call is happening in the Map and 
inputOI is being derived off of what appears to be the first column of the 
Spokes table schema, which is an int and is therefore failing when being 
cats to a Long. It appears the cast is merely symptomatic of a difference 
in the execution plans. 

Debugging from this point, really requires someone who understands HIVE 
execution plans better than I do. Is there anyone that can help with this 
issue? This is really easy to replicate. Download the IBM JDK, mod your 
hadoop env to point to the  extracted dir of the IBM JDK and do a select 
count from any table.

Regards
Steve Watt
--=_alternative 0059774486257762_=
Content-Type: text/html; charset="US-ASCII"

<font size=2 face="sans-serif">Hi Folks</font>
<br>
<br><font size=2 face="sans-serif">This issue occurs on Hive 0.4 and 0.5.
I wanted to wait on opening a JIRA ticket until I ran it by the community
first.</font>
<br>
<br><font size=2 face="sans-serif">I'm testing Hive 0.5 running on Apache
Hadoop 0.20.2 which is using IBM Java 6 &nbsp;(32 bit x86 Java SR8 : which
can be obtained here - </font><a href=https://www.ibm.com/developerworks/java/jdk/linux/download.html><font size=2 face="sans-serif">https://www.ibm.com/developerworks/java/jdk/linux/download.html</font></a><font size=2 face="sans-serif">)
</font>
<br>
<br><font size=2 face="sans-serif">To recreate this I'm using the pokes
table loaded with data from the examples directory, per the tutorial and
I run the following in the Hive CLI (bin/hive) : select count(1) from pokes;
</font>
<br>
<br><font size=2 face="sans-serif">This works just fine on Sun/Oracle Java
6, but when I change the Hadoop-env to point to IBM Java 6 it fails in
the Map with the following exception :</font>
<br>
<br><font size=2 face="sans-serif">Caused by: java.lang.ClassCastException:
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector
incompatible with org.apache.hadoop.hive.serde2.objectinspector.primitive.LongObjectInspector</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; at
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount$GenericUDAFCountEvaluator.merge(GenericUDAFCount.java:104)</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; at
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:113)</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; at
org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:451)</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; at
org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:591)</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; at
org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:500)</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; ...
14 more</font>
<br>
<br><font size=2 face="sans-serif">Note, the line number in GenericUDAFCount
here is off by 4 based on a couple of LOG.info calls I added for debugging
purposes. &nbsp;The net of it is that it is failing when it attempts to
do the following cast in the merge method:</font>
<br><tt><font size=2>(LongObjectInspector)</font></tt><tt><font size=2 color=#0021bf>inputOI</font></tt>
<br>
<br><font size=2 face="sans-serif">This is where it gets weird. In SUN
Java, this method gets called in the Reducer. In IBM Java, it gets called
in the Mapper. If I use EXPLAIN in the Hive CLI, the execution plans are
identical regardless of which JRE is being used in Hadoop. In SUN Java,
the type for inputOI is a BigInt which is being derived off of a single
column schema called _col0_ in the reducer (likely the output tuple of
the count result) and casts to a Long with no problem. In IBM Java, this
call is happening in the Map and inputOI is being derived off of what appears
to be the first column of the Spokes table schema, which is an int and
is therefore failing when being cats to a Long. It appears the cast is
merely symptomatic of a difference in the execution plans. </font>
<br>
<br><font size=2 face="sans-serif">Debugging from this point, really requires
someone who understands HIVE execution plans better than I do. Is there
anyone that can help with this issue? This is really easy to replicate.
Download the IBM JDK, mod your hadoop env to point to the &nbsp;extracted
dir of the IBM JDK and do a select count from any table.</font>
<br>
<br><font size=2 face="sans-serif">Regards<br>
Steve Watt</font>
--=_alternative 0059774486257762_=--