Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8282E17B9D for ; Wed, 18 Mar 2015 14:41:03 +0000 (UTC) Received: (qmail 50098 invoked by uid 500); 18 Mar 2015 14:40:57 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 50030 invoked by uid 500); 18 Mar 2015 14:40:57 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 50020 invoked by uid 99); 18 Mar 2015 14:40:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Mar 2015 14:40:57 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of robert.waury@googlemail.com designates 209.85.217.182 as permitted sender) Received: from [209.85.217.182] (HELO mail-lb0-f182.google.com) (209.85.217.182) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Mar 2015 14:40:53 +0000 Received: by lbblx11 with SMTP id lx11so9024909lbb.3 for ; Wed, 18 Mar 2015 07:40:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=IiS6xYbSbIppzjtHD0qK0Q7ZFROcuzOqVJ9MDXr02NQ=; b=kAZkiEozSw67n6+zADcwNIR7AaZ7Ze90ox7XGRf/q58vMnl4fJOvX6JMdcmrr/GSkB 0l4T1jDuDduyV9Y76JAnKKOWh6tdYaKA9x6DXeDvxECzkmJevAdvSqlo0wHtqeE3Cj6p vgYehfcy+H4jT1UfuOXC8yRVnjS0cImQovBFY7ypDWUNaSGezDNuzctDbBlKUKsx1XVJ IhkdQIHPDJgBGnz2CHZeiF+9kKuoSJ7/ociVLXhdTJi/JPtkJpW3kEI8RXt15hRgqQ/z JB1XvC+Ud7APXTDOGa4sSIBAHGCJCNZuDlwVqk6pkiAySezvBEwc08yVfMldIh18jL6m bfSw== MIME-Version: 1.0 X-Received: by 10.152.180.202 with SMTP id dq10mr62349923lac.74.1426689631872; Wed, 18 Mar 2015 07:40:31 -0700 (PDT) Received: by 10.25.214.157 with HTTP; Wed, 18 Mar 2015 07:40:31 -0700 (PDT) In-Reply-To: <55083FBE.2090408@informatik.hu-berlin.de> References: <550773F0.6000505@informatik.hu-berlin.de> <55077483.2080601@informatik.hu-berlin.de> <55083FBE.2090408@informatik.hu-berlin.de> Date: Wed, 18 Mar 2015 15:40:31 +0100 Message-ID: Subject: Re: RuntimeException Gelly API: Memory ran out. Compaction failed. From: Robert Waury To: user@flink.apache.org Content-Type: multipart/alternative; boundary=001a1134788ab66c4605119110f3 X-Virus-Checked: Checked by ClamAV on apache.org --001a1134788ab66c4605119110f3 Content-Type: text/plain; charset=UTF-8 Hi, I managed to reproduce the behavior and as far as I can tell it seems to be a problem with the memory allocation. I have filed a bug report in JIRA to get the attention of somebody who knows the runtime better than I do. https://issues.apache.org/jira/browse/FLINK-1734 Cheers, Robert On Tue, Mar 17, 2015 at 3:52 PM, Mihail Vieru wrote: > Hi Robert, > > thank you for your reply. > > I'm starting the job from the Scala IDE. So only one JobManager and one > TaskManager in the same JVM. > I've doubled the memory in the eclipse.ini settings but I still get the > Exception. > > -vmargs > -Xmx2048m > -Xms100m > -XX:MaxPermSize=512m > > Best, > Mihail > > > On 17.03.2015 10:11, Robert Waury wrote: > > Hi, > > can you tell me how much memory your job has and how many workers you are > running? > > From the trace it seems the internal hash table allocated only 7 MB for > the graph data and therefore runs out of memory pretty quickly. > > Skewed data could also be an issue but with a minimum of 5 pages and a > maximum of 8 it seems to be distributed fairly even to the different > partitions. > > Cheers, > Robert > > On Tue, Mar 17, 2015 at 1:25 AM, Mihail Vieru < > vieru@informatik.hu-berlin.de> wrote: > >> And the correct SSSPUnweighted attached. >> >> >> On 17.03.2015 01:23, Mihail Vieru wrote: >> >>> Hi, >>> >>> I'm getting the following RuntimeException for an adaptation of the >>> SingleSourceShortestPaths example using the Gelly API (see attachment). >>> It's been adapted for unweighted graphs having vertices with Long values. >>> >>> As an input graph I'm using the social network graph (~200MB unpacked) >>> from here: https://snap.stanford.edu/data/higgs-twitter.html >>> >>> For the small SSSPDataUnweighted graph (also attached) it terminates and >>> computes the distances correctly. >>> >>> >>> 03/16/2015 17:18:23 IterationHead(WorksetIteration (Vertex-centric >>> iteration >>> (org.apache.flink.graph.library.SingleSourceShortestPathsUnweighted$VertexDistanceUpdater@dca6fe4 >>> | >>> org.apache.flink.graph.library.SingleSourceShortestPathsUnweighted$MinDistanceMessenger@6577e8ce)))(2/4) >>> switched to FAILED >>> java.lang.RuntimeException: Memory ran out. Compaction failed. >>> numPartitions: 32 minPartition: 5 maxPartition: 8 number of overflow >>> segments: 176 bucketSize: 217 Overall memory: 20316160 Partition memory: >>> 7208960 Message: Index: 8, Size: 7 >>> at >>> org.apache.flink.runtime.operators.hash.CompactingHashTable.insert(CompactingHashTable.java:390) >>> at >>> org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTable(CompactingHashTable.java:337) >>> at >>> org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:216) >>> at >>> org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:278) >>> at >>> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) >>> at >>> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:205) >>> at java.lang.Thread.run(Thread.java:745) >>> >>> >>> Best, >>> Mihail >>> >> >> > > --001a1134788ab66c4605119110f3 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

I managed to reproduc= e the behavior and as far as I can tell it seems to be a problem with the m= emory allocation.

I have filed a bug report in JIRA to get the= attention of somebody who knows the runtime better than I do.

https://issues.ap= ache.org/jira/browse/FLINK-1734


Cheers,
Robert

On Tue, Ma= r 17, 2015 at 3:52 PM, Mihail Vieru <vieru@informatik.hu-berli= n.de> wrote:
=20 =20 =20
Hi Robert,

thank you for your reply.

I'm starting the job from the Scala IDE. So only one JobManager and one TaskManager in the same JVM.
I've doubled the memory in the eclipse.ini settings but I still get the Exception.

-vmargs
-Xmx2048m
-Xms100m
-XX:MaxPermSize=3D512m

Best,
Mihail


On 17.03.2015 10:11, Robert Waury wrote:
Hi,

can you tell me how much memory your job has and how many workers you are running?

From the trace it seems the internal hash table allocated only 7 MB for the graph data and therefore runs out of memory pretty quickly.

Skewed data could also be an issue but with a minimum of 5 pages and a maximum of 8 it seems to be distributed fairly even to the different partitions.

Cheers,
Robert

On Tue, Mar 17, 2015 at 1:25 AM, Mihail Vieru <vieru@informatik.hu-berlin.de> wrote:
And the correct SSSPUnweighted attached.


On 17.03.2015 01:23, Mihail Vieru wrote:
Hi,

I'm getting the following RuntimeException for an adaptation of the SingleSourceShortestPaths example using the Gelly API (see attachment). It's been adapted for unweighted graphs having vertices with Long values.

As an input graph I'm using the social network graph (~200MB unpacked) from here: https://snap.stanford.e= du/data/higgs-twitter.html

For the small SSSPDataUnweighted graph (also attached) it terminates and computes the distances correctly.


03/16/2015 17:18:23=C2=A0 =C2=A0 IterationHead(WorksetIte= ration (Vertex-centric iteration (org.apache.flink.graph.library= .SingleSourceShortestPathsUnweighted$VertexDistanceUpdater@dca6fe4 | org.apache.flink.graph.library.SingleSourceShortestPath= sUnweighted$MinDistanceMessenger@6577e8ce)))(2/4) switched to FAILED
java.lang.RuntimeException: Memory ran out. Compaction failed. numPartitions: 32 minPartition: 5 maxPartition: 8 number of overflow segments: 176 bucketSize: 217 Overall memory: 20316160 Partition memory: 7208960 Message: Index: 8, Size: 7
=C2=A0 =C2=A0 at org.apache.flink.runtime.operators.hash.= CompactingHashTable.insert(CompactingHashTable.java:390)
=C2=A0 =C2=A0 at org.apache.flink.runtime.operators.hash.= CompactingHashTable.buildTable(CompactingHashTable.java:337)
=C2=A0 =C2=A0 at org.apache.flink.runtime.iterative.task.= IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:216= )
=C2=A0 =C2=A0 at org.apache.flink.runtime.iterative.task.= IterationHeadPactTask.run(IterationHeadPactTask.java:278)
=C2=A0 =C2=A0 at org.apache.flink.runtime.operators.Regul= arPactTask.invoke(RegularPactTask.java:362)
=C2=A0 =C2=A0 at org.apache.flink.runtime.execution.Runti= meEnvironment.run(RuntimeEnvironment.java:205)
=C2=A0 =C2=A0 at java.lang.Thread.run(Thread.java:745)

Best,
Mihail




--001a1134788ab66c4605119110f3--