hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasanth Jayachandran <>
Subject Re: Java Heap Memory OOM when using ORCNewInputFormat in MR
Date Tue, 30 Aug 2016 22:12:58 GMT
Under memory pressure, the stack trace of OOM can be different depending on who is requesting
more memory when the memory is already full. That is the reason you are seeing OOM in writeMetadata
(it may happen in other places as well). When dealing with thousands of columns its better
to set hiv.exec.orc.default.buffer.size to lower value until you can avoid OOM. Depending
on the version of hive you are using, this may be set automatically for you. In older hive
versions, if number of columns is >1000 buffer size will be automatically chosen. In newer
version, this limit is removed and orc writer will figure out the optimal buffer size based
on stripe size, available memory and number of columns.


On Aug 30, 2016, at 3:04 PM, Hank baker <<>>

Hi all,

I'm trying to run a map reduce job to convert csv data into orc using the OrcNewOutputFormat
(reduce is required to satisfy some partitioning logic) but getting an OOM error at reduce
phase (during merge to be exact) with the below attached stacktrace for one particular table
which has about 800 columns and this error seems common across all reducers(minimum reducer
input records is about 20, max. is about 100 mil). I am trying to figure out the exact cause
of the error since I have use the same job to convert tables with 100-10000 columns without
any memory or config changes.

What concerns me in the stack trace is this line:


Why is it going OOM while trying to write MetaData ?

I originally believed this was simply due to the number of open buffers (as mentioned in
I wrote a bit of code to reproduce the error on my local setup by creating an instance of
OrcRecordWriter and writing large number of columns, I did get a similar heap space error,
however it was going OOM while trying to flush the stripes, with this in the stacktrace:


This issue on the dev environment got resolved by setting


Will the same setting work for the original error?

For different reasons I cannot change the reducer memory or lower the buffer size even at
a job level. For now, I am just trying to understand the source of this error. Can anyone
please help?

Original OOM stacktrace:

FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError:
Java heap space
        at java.nio.HeapByteBuffer.<init>(
        at java.nio.ByteBuffer.allocate(
        at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(
        at org.apache.hadoop.mapred.YarnChild$
        at Method)
        at org.apache.hadoop.mapred.YarnChild.main(

View raw message