hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasanth Jayachandran <pjayachand...@hortonworks.com>
Subject Re: wide ORC partitioned table insert causes "GC Overhead Limit Exceeded" error
Date Thu, 25 May 2017 06:08:51 GMT
Did you try with hive.optimize.sort.dynamic.partition set to true?

Thanks
Prasanth



On Wed, May 24, 2017 at 11:07 PM -0700, "Revin Chalil" <rchalil@expedia.com<mailto:rchalil@expedia.com>>
wrote:

Forgot to mention that we use hive 1.2.

From: Revin Chalil [mailto:rchalil@expedia.com]
Sent: Wednesday, May 24, 2017 10:53 PM
To: user@hive.apache.org
Subject: wide ORC partitioned table insert causes "GC Overhead Limit Exceeded" error

I get the "GC Overhead Limit Exceeded" and “java heap space” errors when trying to insert
from Stage table to wide (~2500 columns) ORC partitioned table with dynamic partitioning.
It consistently fails with TEZ and but succeeds with lesser data in “MR” execution engine.
We have been trying to adjust the below settings with MR, but it still fails if there is data
for more than 1 day.

set mapreduce.map.memory.mb=16288;
set mapreduce.map.java.opts=-Xmx10192m;
set mapreduce.reduce.memory.mb=16288;
set mapreduce.reduce.java.opts=-Xmx10192m;

The insert statement is with very simple select statement as below.

INSERT INTO TABLE table_partitioned (partition_key)
  SELECT
   999999
  ,*
  ,IF(Date<>'None', SUBSTR(Date,1,10), '1970-01-01') AS partition_key
  FROM stg_table

The ORC tale uses Snappy compression. This cluster does not have anything else running. Are
there settings to suppress GC / memory footprint for this Insert, so that it may succeed with
more data?

Thanks,
Revin
Mime
View raw message