hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evan Pollan (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-2563) OutOfMemory errors when using dynamic partition inserts with large number of partitions
Date Wed, 09 Nov 2011 01:53:51 GMT

    [ https://issues.apache.org/jira/browse/HIVE-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146723#comment-13146723
] 

Evan Pollan commented on HIVE-2563:
-----------------------------------

By the way, the total number of records in this table (if were able to insert successfully
:), is just over 5 million.
                
> OutOfMemory errors when using dynamic partition inserts with large number of partitions
> ---------------------------------------------------------------------------------------
>
>                 Key: HIVE-2563
>                 URL: https://issues.apache.org/jira/browse/HIVE-2563
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.1
>         Environment: Cloudera CDH3 Update 2 distro on Ubuntu 10.04 64 bit cluster nodes
>            Reporter: Evan Pollan
>
> I'm trying to use dynamic partition inserts to mimic a legacy file generation process
that creates a single file per combination of two record attributes, one with a low cardinality,
and one with a high degree of cardinality.  In a small data set, I can do this successfully.
 Using a larger data set on the same 11 node cluster, with a combined cardinality resulting
in ~1600 partitions, I get out of memory errors in the reduce phase 100% of the time.  
> I'm running with the following settings, writing to a textfile-backed table with two
partitions of type string:
> SET hive.exec.compress.output=true; 
> SET io.seqfile.compression.type=BLOCK;
> SET mapred.max.map.failures.percent=100;
> SET hive.exec.dynamic.partition=true;
> SET hive.exec.dynamic.partition.mode=nonstrict;
> SET hive.exec.max.dynamic.partitions=10000;
> SET hive.exec.max.dynamic.partitions.pernode=10000;
> (I've also tried gzip compression with the same result)
> Here's an example of the error:
> 2011-11-09 00:51:52,425 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final
Path: FS hdfs://ec2-50-19-131-121.compute-1.amazonaws.com/tmp/hive-hdfs/hive_2011-11-09_00-48-57_840_6003656718210084497/_tmp.-ext-10000/requestday=2011-09-29/clientname=XXXX-JA/000008_0.deflate
> 2011-11-09 00:51:52,461 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing
logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2011-11-09 00:51:52,464 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError:
unable to create new native thread
> 	at java.lang.Thread.start0(Native Method)
> 	at java.lang.Thread.start(Thread.java:640)
> 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2931)
> 	at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:544)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:219)
> 	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:584)
> 	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:565)
> 	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:472)
> 	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:464)
> 	at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.getHiveRecordWriter(HiveIgnoreKeyTextOutputFormat.java:80)
> 	at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:247)
> 	at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:235)
> 	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:458)
> 	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutWriters(FileSinkOperator.java:599)
> 	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:539)
> 	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
> 	at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> 	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:959)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:798)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:724)
> 	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
> 	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247)
> 	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:469)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:264)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message