hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guodong Wang <wangg...@gmail.com>
Subject NegativeArraySizeException in table join
Date Thu, 15 Jan 2015 09:53:10 GMT
Hi,

I am using hive 0.13.1 and currently I am blocked by a bug when joining 2
tables. Here is the sample query.

INSERT OVERWRITE TABLE test_archive PARTITION(data='2015-01-17', name, type)
SELECT COALESCE(b.resource_id, a.id) AS id,
       a.timstamp,
       a.payload,
       a.name,
       a.type
FROM test_data a LEFT OUTER JOIN id_mapping b on a.id = b.id
WHERE a.date='2015-01-17'
    AND a.name IN ('a‘, 'b', 'c')
    AND a.type <= 14;

It turns out that when there are more than 25000 joins rows on a specific
id, hive MR job fails, throwing NegativeArraySizeException.

Here is the stack trace

2015-01-15 14:38:42,693 ERROR
org.apache.hadoop.hive.ql.exec.persistence.RowContainer:
java.lang.NegativeArraySizeException
	at org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144)
	at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:123)
	at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:179)
	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
	at org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:2244)
	at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2228)
	at org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103)
	at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78)
	at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
	at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
	at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
	at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:740)
	at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
	at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)
2015-01-15 14:38:42,707 FATAL ExecReducer:
org.apache.hadoop.hive.ql.metadata.HiveException:
org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.NegativeArraySizeException
	at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
	at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
	at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:740)
	at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
	at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.NegativeArraySizeException
	at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:385)
	at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
	... 11 more
Caused by: java.lang.NegativeArraySizeException
	at org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144)
	at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:123)
	at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:179)
	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
	at org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:2244)
	at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2228)
	at org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103)
	at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78)
	at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
	... 12 more


I found that when the exceptions are thrown. There is a log like this

2015-01-15 14:38:42,045 INFO
org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer
created temp file
/local/data0/mapred/taskTracker/ubuntu/jobcache/job_201412171918_0957/attempt_201412171918_0957_r_000000_0/work/tmp/hive-rowcontainer5023288010679723993/RowContainer5093924743042924240.tmp


Looks like when RowContainer collects more than 25000 row records.
It will flush out the block to local disk. But it can not read
these blocks out.

Any help is really appreciated!



Guodong

Mime
View raw message