spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jie Deng <deng113...@gmail.com>
Subject Re: Re: OOM, help
Date Tue, 17 Dec 2013 12:05:41 GMT
ehhhhh......it's hard to say why 9g is not enough, but your file is 7g, and
object each string in that file must need more memory;
I think you can somehow try to using hdfs store processing data, instead of
putting everything in the memory.



2013/12/17 leosandylh@gmail.com <leosandylh@gmail.com>

>  HI,
> I have set my config with :
>  export SPARK_WORKER_MEMORY=1024m
> export SPARK_DAEMON_JAVA_OPTS=9000m
> Why the memory is still not enough ?
>
> Thanks
>
> ------------------------------
>  leosandylh@gmail.com
>
>  *From:* Jie Deng <deng113jie@gmail.com>
> *Date:* 2013-12-17 19:44
> *To:* user <user@spark.incubator.apache.org>
> *Subject:* Re: OOM, help
>  Hi,Leo,
>
> I think java.lang.OutOfMemoryError: Java heap space is caused by java
> memory problem, no connection with spark.
> Just try -Xmx: more memory when start jvm
>
>
> 2013/12/17 leosandylh@gmail.com <leosandylh@gmail.com>
>
>>   hello everyone,
>> I have a problem when I run the wordcount example. I read data from hdfs
>> , its almost 7G.
>> I haven't seen the info from the web ui or sparkhome/work . This is the
>> console info :
>> .....
>>  13/12/16 19:48:02 INFO LocalTaskSetManager: Size of task 52 is 1834
>> bytes
>> 13/12/16 19:48:02 INFO LocalScheduler: Running 52
>> 13/12/16 19:48:02 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
>> Getting 52 non-zero-bytes blocks out of 52 blocks
>> 13/12/16 19:48:02 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
>> Started 0 remote gets in  7 ms
>> 13/12/16 19:48:09 INFO LocalTaskSetManager: Loss was due to
>> java.lang.OutOfMemoryError
>> java.lang.OutOfMemoryError: Java heap space
>>         at java.util.Arrays.copyOf(Arrays.java:2271)
>>         at
>> java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
>>         at
>> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>>         at
>> java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
>>         at
>> java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1857)
>>         at
>> java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1766)
>>         at
>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1185)
>>         at
>> java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
>>         at
>> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:27)
>>         at
>> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:47)
>>         at
>> org.apache.spark.scheduler.local.LocalScheduler.runTask(LocalScheduler.scala:204)
>>         at
>> org.apache.spark.scheduler.local.LocalActor$$anonfun$launchTask$1$$anon$1.run(LocalScheduler.scala:68)
>>         at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>         at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>         at java.lang.Thread.run(Thread.java:722)
>> 13/12/16 19:48:09 INFO LocalScheduler: Remove TaskSet 0.0 from pool
>> 13/12/16 19:48:09 INFO DAGScheduler: Failed to run collect at <console>:17
>> org.apache.spark.SparkException: Job failed: Task 0.0:0 failed more than
>> 4 times; aborting job java.lang.OutOfMemoryError: Java heap space
>>         at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760)
>>         at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758)
>>         at
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
>>         at
>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>         at
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758)
>>         at
>> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:379)
>>         at org.apache.spark.scheduler.DAGScheduler.org
>> $apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441)
>>         at
>> org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149)
>>
>> this is my spark-env.sh :
>>
>>  export
>> SPARK_HOME=/home/lh1/spark_hadoopapp/spark-0.8.0-hadoop2.0.0-cdh4.2.1
>> export JAVA_HOME=/home/lh1/app/jdk1.7.0
>> export SCALA_HOME=/home/lh1/sparkapp/scala-2.9.3
>>  export SPARK_WORKER_CORES=2
>> export SPARK_WORKER_MEMORY=1024m
>> export SPARK_WORKER_INSTANCES=2
>> export SPARK_DAEMON_JAVA_OPTS=9000m
>>
>> I just started to use Spark , so  can you give me some suggestions ?
>>
>> Thanks .
>>
>> Leo
>> ------------------------------
>>
>> ------------------------------
>>   leosandylh@gmail.com
>>
>
>

Mime
View raw message