hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space
Date Wed, 15 Apr 2009 00:12:14 GMT

    [ https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698996#action_12698996
] 

Alan Gates commented on PIG-766:
--------------------------------

It isn't overall data size that matters.  It is the size of a given key.  So if you have a
2G data set up it has only one key (that is, every row has that key), then you'll hit this
problem (assuming you can't fit 2G in memory on your data nodes).  Pig does try to spill to
avoid this, but has a hard time knowing when and how much to spill, and thus often runs out
of memory.

But I think you're right that this isn't in the join.  From the stack it looks like it's trying
to write data out of the map task.  Do you have very large rows in this data?

> ava.lang.OutOfMemoryError: Java heap space
> ------------------------------------------
>
>                 Key: PIG-766
>                 URL: https://issues.apache.org/jira/browse/PIG-766
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop-0.18.3 (cloudera RPMs).
> mapred.child.java.opts=-Xmx1024m
>            Reporter: Vadim Zaliva
>
> My pig script always fails with the following error:
> Java.lang.OutOfMemoryError: Java heap space
>        at java.util.Arrays.copyOf(Arrays.java:2786)
>        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
>        at java.io.DataOutputStream.write(DataOutputStream.java:90)
>        at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
>        at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213)
>        at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
>        at org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233)
>        at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162)
>        at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
>        at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83)
>        at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
>        at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
>        at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156)
>        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857)
>        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467)
>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101)
>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219)
>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86)
>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message