flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: EOFException when running Flink job
Date Mon, 20 Apr 2015 12:42:14 GMT
Hi!

The "unmanaged solution set" keeps the data outside the Flink managed
memory, and is expected to throw OOM exceptions at some point. The managed
variant clearly has a bug.

Can you open a JIRA ticket for that and describe briefly where it happens
and paste also the stacktrace?

Stephan#

On Sun, Apr 19, 2015 at 2:17 PM, Mihail Vieru <vieru@informatik.hu-berlin.de
> wrote:

>  Hi,
>
> I also get the EOFException on 0.9-SNAPSHOT when using a modified
> SingleSourceShortestPaths on big graphs.
>
> After I set the SolutionSet to "unmanaged" the job finishes. But this only
> works until a certain point, over which I cannot increase the input size
> further without getting an OutOfMemoryError exception.
>
> Best,
> Mihail
>
>
> On 19.04.2015 13:52, Stefan Bunk wrote:
>
> Hi,
>
>  I tested three configurations:
>
> taskmanager.heap.mb: 6144, taskmanager.memory.fraction: 0.5
> taskmanager.heap.mb: 5544, taskmanager.memory.fraction: 0.6
> taskmanager.heap.mb: 5144, taskmanager.memory.fraction: 0.7
>
>  The error occurs in all three configurations.
> In the last configuration, I can even find another exception in the logs
> of one of the taskmanagers:
>
>  19.Apr. 13:39:29 INFO  Task                 -
> IterationHead(WorksetIteration (Resolved-Redirects)) (10/10) switched to
> FAILED : java.lang.IndexOutOfBoundsException: Index: 161, Size: 161
>         at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>         at java.util.ArrayList.get(ArrayList.java:411)
>         at
> org.apache.flink.runtime.operators.hash.InMemoryPartition$WriteView.resetTo(InMemoryPartition.java:352)
>         at
> org.apache.flink.runtime.operators.hash.InMemoryPartition$WriteView.access$100(InMemoryPartition.java:301)
>         at
> org.apache.flink.runtime.operators.hash.InMemoryPartition.appendRecord(InMemoryPartition.java:226)
>         at
> org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:536)
>         at
> org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:347)
>         at
> org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:209)
>         at
> org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:270)
>         at
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
>         at
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
>         at java.lang.Thread.run(Thread.java:745)
>
>  Greetings
> Stefan
>
> On 17 April 2015 at 16:05, Stephan Ewen <sewen@apache.org> wrote:
>
>> Hi!
>>
>>  After a quick look over the code, this seems like a bug. One cornercase
>> of the overflow handling code does not check for the "running out of
>> memory" condition.
>>
>>  I would like to wait if Robert Waury has some ideas about that, he is
>> the one most familiar with the code.
>>
>>  I would guess, though, that you should  be able to work around that by
>> either setting the solution set as "unmanaged", or by slightly changing the
>> memory configuration. It seems a rare cornercase, only if the memory runs
>> out in a very special situation. You may be able to avoid running into it
>> when using slightly more or less memory.
>>
>>  Greetings,
>> Stephan
>>
>>
>> On Fri, Apr 17, 2015 at 3:59 PM, Stefan Bunk <stefan.bunk@googlemail.com>
>> wrote:
>>
>>> Hi Squirrels,
>>>
>>>  I have some trouble with a delta-iteration transitive closure program
>>> [1].
>>> When I run the program, I get the following error:
>>>
>>>  java.io.EOFException
>>>  at
>>> org.apache.flink.runtime.operators.hash.InMemoryPartition$WriteView.nextSegment(InMemoryPartition.java:333)
>>>  at
>>> org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.advance(AbstractPagedOutputView.java:140)
>>>  at
>>> org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeByte(AbstractPagedOutputView.java:223)
>>>  at
>>> org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeLong(AbstractPagedOutputView.java:291)
>>>  at
>>> org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeDouble(AbstractPagedOutputView.java:307)
>>>  at
>>> org.apache.flink.api.common.typeutils.base.DoubleSerializer.serialize(DoubleSerializer.java:62)
>>>  at
>>> org.apache.flink.api.common.typeutils.base.DoubleSerializer.serialize(DoubleSerializer.java:26)
>>>  at
>>> org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:89)
>>>  at
>>> org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:29)
>>>  at
>>> org.apache.flink.runtime.operators.hash.InMemoryPartition.appendRecord(InMemoryPartition.java:219)
>>>  at
>>> org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:536)
>>>  at
>>> org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:347)
>>>  at
>>> org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:209)
>>>  at
>>> org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:270)
>>>  at
>>> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
>>>  at
>>> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
>>>  at java.lang.Thread.run(Thread.java:745)
>>>
>>>  Both input files have been generated and written to HDFS by Flink
>>> jobs. I already ran the Flink program that generated them several times:
>>> the error persists.
>>> You can find the logs at [2] and [3].
>>>
>>>  I am using the 0.9.0-milestone-1 release.
>>>
>>>  Best,
>>> Stefan
>>>
>>>
>>>  [1] Flink job: https://gist.github.com/knub/0bfec859a563009c1d57
>>> [2] Job manager logs: https://gist.github.com/knub/01e3a4b0edb8cde66ff4
>>> [3] One task manager's logs:
>>> https://gist.github.com/knub/8f2f953da95c8d7adefc
>>>
>>
>>
>
>

Mime
View raw message