hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix.徐 <ygnhz...@gmail.com>
Subject CombineFileInputFormat ran out of memory while making splits
Date Mon, 16 Jul 2012 14:22:19 GMT
HI all,
I have written a MyCombineFileInputFormat extends from
CombineFileInputFormat , it can put multi files together into the same
inputsplit, it works fine for just a small amount of files. But if I try to
process 100,000 small files, the CombineFileInputFormat ran out of memory
while splitting input files:

The stack trace is:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2245)
at java.util.Arrays.copyOf(Arrays.java:2219)
at java.util.ArrayList.grow(ArrayList.java:213)
at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:187)
at java.util.ArrayList.addAll(ArrayList.java:532)
at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getHosts(CombineFileInputFormat.java:568)
at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:410)
at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:989)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:981)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261)
at ac.ict.mapreduce.test.MR.main(MR.java:114)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

It seems that the rack-nodes mapping is too big? .... How to solve this
problem? thanks!

Mime
View raw message