hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Miller <someb...@squareplanet.de>
Subject Re: Killed : GC overhead limit exceeded
Date Sun, 18 Jul 2010 11:11:24 GMT
Thanks Ted,

One or both suggestions remedied the problem.  I'm not seeing that error 
anymore.

In my Driver class I used config.set("mapred.child.java.opts", 
"-Xmx2048m -Xincgc");
But I also altered my mapred-site.xml and set:
     io.file.buffer.size 65536
     io.sort.factor 32
     io.sort.mb 320

For the 2nd suggestion. I'm a java novice, so I'm not sure if this 
actually does what you intended:

I moved the 3 Patterns outside my map() and changed the logic to this:

public class MyMapper extends Mapper<Object, Text, Text, Text) {

   Pattern tabPattern = Pattern.compile("\t");
   Pattern eolPattern = Pattern.compile("\n");
   Pattern spacePattern = Pattern.compile("(^[\\s]*)|([\\s]$)");

   public void map(Object key, Text value, Context context) {
       for (String line : eolPattern.split(value.toString()) {
         ....
         String[] values = tabPattern.split(line);
         for (int i=0; i,values.length; i++) {
             values[i] = spacePattern.matcher(values[i]).replaceAll("");
         }
         parser.setvals(values);
         ....
     }
   }
}

Alan

On 07/17/2010 07:28 AM, Ted Yu wrote:
> Have you tried increasing memory beyond 1GB for your map task ?
>
> I think you have noticed that both OOME came from Pattern.compile().
>
> Please take a look at
> http://www.docjar.com/html/api/java/lang/String.java.html
>
> I would suggest pre-compiling the three patterns when setting up your mapper
> - basically write your own split() and replaceAll().
>
> I recently did something similar. You can find out the performance
> improvement by customization -
> https://issues.apache.org/jira/browse/MAPREDUCE-1946
>
> Cheers
>
> On Fri, Jul 16, 2010 at 6:06 AM, Some Body<somebody@squareplanet.de>  wrote:
>
>    
>> Guess attachments are stripped.
>>
>> Here's the memory graph:   http://tinyurl.com/37g3hmu
>> Here's the VM Summary:   http://tinyurl.com/36wqzjq
>>
>> Alan
>>
>>      
>    


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message