hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-9324) Reduce side joins failing with IOException from RowContainer.nextBlock
Date Fri, 09 Jan 2015 08:52:35 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270779#comment-14270779
] 

Amareshwari Sriramadasu commented on HIVE-9324:
-----------------------------------------------

After doing some code walkthrough, here is what i found,

On JoinOperator, whenever any key as more values than BLOCKSIZE(hardcoded to 25000), it spills
the values to a file on disk, and spill uses SequenceFile format. 

Here is the table description for spill (from org.apache.hadoop.hive.ql.exec.JoinUtil.java)
{noformat}
      TableDesc tblDesc = new TableDesc(
          SequenceFileInputFormat.class, HiveSequenceFileOutputFormat.class,
          Utilities.makeProperties(
          org.apache.hadoop.hive.serde.serdeConstants.SERIALIZATION_FORMAT, ""
          + Utilities.ctrlaCode,
          org.apache.hadoop.hive.serde.serdeConstants.LIST_COLUMNS, colNames
          .toString(),
          org.apache.hadoop.hive.serde.serdeConstants.LIST_COLUMN_TYPES,
          colTypes.toString(),
          serdeConstants.SERIALIZATION_LIB,LazyBinarySerDe.class.getName()));
      spillTableDesc[tag] = tblDesc;
{noformat}
>From the exception:
{noformat}
Caused by: java.io.IOException: org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read
1 bytes, should read 27264
	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2435)
	at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76)
	at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
	... 13 more
{noformat}

I see that the value in SequenceFile is RCFile$KeyBuffer, dont know why. Also couldnt figure
out the reason why the reading went wrong.

Following is the code snippet from SequenceFile.java for the exception we are hitting :
{noformat}
2417     public synchronized Object next(Object key) throws IOException {
2418       if (key != null && key.getClass() != getKeyClass()) {
2419         throw new IOException("wrong key class: "+key.getClass().getName()
2420                               +" is not "+keyClass);
2421       }
2422 
2423       if (!blockCompressed) {
2424         outBuf.reset();
2425 
2426         keyLength = next(outBuf);
2427         if (keyLength < 0)
2428           return null;
2429 
2430         valBuffer.reset(outBuf.getData(), outBuf.getLength());
2431 
2432         key = deserializeKey(key);
2433         valBuffer.mark(0);
2434         if (valBuffer.getPosition() != keyLength)
2435           throw new IOException(key + " read " + valBuffer.getPosition()
2436                                 + " bytes, should read " + keyLength);
{noformat}

> Reduce side joins failing with IOException from RowContainer.nextBlock
> ----------------------------------------------------------------------
>
>                 Key: HIVE-9324
>                 URL: https://issues.apache.org/jira/browse/HIVE-9324
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.13.1
>            Reporter: Amareshwari Sriramadasu
>
> We are seeing some reduce side join mapreduce jobs failing with following exception :
> {noformat}
> 2014-12-14 16:58:51,296 ERROR org.apache.hadoop.hive.ql.exec.persistence.RowContainer:
org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes, should read 27264
> java.io.IOException: org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8 read 1 bytes,
should read 27264
> 	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2435)
> 	at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76)
> 	at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
> 	at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
> 	at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
> 	at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
> 	at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
> 	at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
> 	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:416)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:262)
> 2014-12-14 16:58:51,334 FATAL ExecReducer: org.apache.hadoop.hive.ql.metadata.HiveException:
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8
read 1 bytes, should read 27264
> 	at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
> 	at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
> 	at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644)
> 	at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
> 	at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216)
> 	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:416)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8
read 1 bytes, should read 27264
> 	at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:385)
> 	at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:230)
> 	... 12 more
> Caused by: java.io.IOException: org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer@42610e8
read 1 bytes, should read 27264
> 	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2435)
> 	at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76)
> 	at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.nextBlock(RowContainer.java:360)
> 	... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message