hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasanth Jayachandran <pjayachand...@hortonworks.com>
Subject Re: Container out of memory: ORC format with many dynamic partitions
Date Mon, 02 May 2016 19:48:27 GMT
Hi

Can you please post explain plan for your insert query? I suspect sorted dynamic partition
optimization is bailing out because of
the constant value for ‘dt' column. If you are not seeing a reducer then its likely not
using the sorted dynamic partition optimization.
You are probably hitting this bug https://issues.apache.org/jira/browse/HIVE-12893
I can confirm if thats the case by looking at the explain plan.

Thanks
Prasanth

On May 2, 2016, at 2:24 PM, Ryan Harris <Ryan.Harris@zionsbancorp.com<mailto:Ryan.Harris@zionsbancorp.com>>
wrote:

reading this:
"but when I add 2000 new titles with 300 rows each"
I'm thinking that you are over-partitioning your data....
I'm not sure exactly how that relates to the OOM error you are getting (it may not)....I'd
test things out partitioning by date-only.... maybe date + title_type, but adding 2000+ dynamic
partitions that each have 300 rows of data in them is asking for problems in Hive IMO...


From: Matt Olson [mailto:maolson42@gmail.com]
Sent: Friday, April 29, 2016 7:50 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Container out of memory: ORC format with many dynamic partitions

Hi all,

I am using Hive 1.0.1 and trying to do a simple insert into an ORC table, creating dynamic
partitions. I am selecting from a table partitioned by dt and category, and inserting into
a table partitioned by dt, title, and title_type. Other than the partitioning, the tables
have the same schemas. Both title and title_type are fields in the first table, and when I
insert into the second table, I am using them to create dynamic partitions. The .q file with
the CREATE and INSERT statements is copied below.

SET hive.optimize.sort.dynamic.partition=true;
SET hive.exec.orc.memory.pool=1.0;
SET hive.exec.max.dynamic.partitions = 5000;
SET hive.exec.max.dynamic.partitions.pernode = 5000;
SET hive.merge.mapfiles = true;
SET mapred.min.split.size=134217728;
SET mapred.min.split.size.per.node=134217728;
SET mapred.min.split.size.per.rack=134217728;
SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
SET mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
SET mapred.max.split.size=134217728;
SET hive.map.aggr.hash.percentmemory=0.125;
SET hive.exec.parallel=true;
SET hive.exec.compress.intermediate=true;
SET hive.exec.compress.output=true;
SET mapred.map.child.java.opts=-Xmx2048M;
SET mapred.child.java.opts=-Xmx2048M;
SET mapred.task.profile=false;

CREATE EXTERNAL TABLE IF NOT EXISTS dynamic_partition_table (
field1 string,
field2 string,
...
field26 string
)
PARTITIONED BY (dt string, title string, title_type string)
STORED AS ORC
LOCATION '/hive/warehouse/partitioned_table'
TBLPROPERTIES ("orc.compress.size"="16000");

INSERT OVERWRITE TABLE dynamic_partition_table PARTITION (dt="2016-04-05", title, title_type)
SELECT
field1,
field2,
...
title,
title_type
FROM original_table
WHERE dt = "2016-04-05";

The original table has about 250 GB of data for 2016-04-05, and about 260 different titles
(some titles have very little data, some have ~20 GB). There is generally only one title_type
per title. The INSERT action succeeds on that data set, but when I add 2000 new titles with
300 rows each to the original table, I get the following error during the INSERT:


Container [pid=6278,containerID=container_e26_1460661845156_49295_01_000244] is running beyond
physical memory limits. Current usage: 2.2 GB of 2 GB physical memory used; 2.7 GB of 4.2
GB virtual memory used. Killing container.


I've found a couple questions online about this same error message for ORC files with lots
of dynamic partitions, on an older version of Hive:
https://qnalist.com/questions/4836037/hive-0-12-orc-heap-issues-on-write

Based on that and the information about configuration properties at https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-ORCFileFormat,
I have tried setting hive.exec.orc.memory.pool=1.0 in order to give as much heap space as
possible to the ORC file writers. As you can see from the CREATE TABLE statement, I also decreased
the orc.compress.size from the default 256 kb to 16 kb. After making these changes, the INSERT
is still failing with the "beyond physical memory limits" error.

I've tried inserting into a table stored as RCFile rather than ORC, and in that case the action
succeeds even with the additional 2000 titles.

Can anyone explain how exactly the two ORC parameters above affect the writing of dynamic
partitions in ORC files, and why I'm not getting the OOM error when I use the RCFile format
instead?  I'd also appreciate any suggestions for other tuning I could do to fix the memory
management when using ORC.

Thanks for any help,
Matt
________________________________
THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain
information that is privileged and exempt from disclosure under applicable law. If you are
neither the intended recipient nor responsible for delivering the message to the intended
recipient, please note that any dissemination, distribution, copying or the taking of any
action in reliance upon the message is strictly prohibited. If you have received this communication
in error, please notify the sender immediately. Thank you.

Mime
View raw message