Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D9D2D1198E for ; Sun, 22 Jun 2014 15:19:01 +0000 (UTC) Received: (qmail 82489 invoked by uid 500); 22 Jun 2014 15:19:01 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 82434 invoked by uid 500); 22 Jun 2014 15:19:01 -0000 Mailing-List: contact dev-help@flink.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.incubator.apache.org Delivered-To: mailing list dev@flink.incubator.apache.org Received: (qmail 82423 invoked by uid 99); 22 Jun 2014 15:19:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 22 Jun 2014 15:19:01 +0000 X-ASF-Spam-Status: No, hits=-1997.8 required=5.0 tests=ALL_TRUSTED,HTML_MESSAGE,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Sun, 22 Jun 2014 15:19:02 +0000 Received: (qmail 81790 invoked by uid 99); 22 Jun 2014 15:18:36 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 22 Jun 2014 15:18:36 +0000 Received: from localhost (HELO mail-qa0-f49.google.com) (127.0.0.1) (smtp-auth username rmetzger, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Sun, 22 Jun 2014 15:18:35 +0000 Received: by mail-qa0-f49.google.com with SMTP id w8so4709112qac.8 for ; Sun, 22 Jun 2014 08:18:35 -0700 (PDT) X-Received: by 10.224.80.74 with SMTP id s10mr24285812qak.77.1403450315057; Sun, 22 Jun 2014 08:18:35 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.49.7 with HTTP; Sun, 22 Jun 2014 08:18:13 -0700 (PDT) In-Reply-To: References: From: Robert Metzger Date: Sun, 22 Jun 2014 17:18:13 +0200 Message-ID: Subject: Re: KMeans job gets stuck and never completes To: "dev@flink.incubator.apache.org" Content-Type: multipart/alternative; boundary=001a11c2db647d332f04fc6e3de3 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c2db647d332f04fc6e3de3 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Okay, let us know if you found the solution. If you want, we can also do a short Google Hangout session with screensharing. Maybe I see something. On Sun, Jun 22, 2014 at 5:09 PM, Jos=C3=A9 Luis L=C3=B3pez Pino wrote: > Yes, I pulled and compiled the latest master from github. > > Thank you for the test Robert, I'll try then to double check the > configuration of both nodes, there should be something wrong. I've tried = to > execute the job with p =3D 1, 2 and 4. > > > Regards // Saludos // Mit Freundlichen Gr=C3=BC=C3=9Fen // Bien cordialem= ent, > Pino > > > On 22 June 2014 16:43, Robert Metzger wrote: > > > I think Pino wrote that he is using the latest master. > > > > I just finished running KMeans on a cluster, with the following > > configuration: > > - 2 nodes, 18 GB heapspace each > > - DOP=3D32 > > - 29 MB input data, 10 centers, 15 iterations max. > > > > I also reduced the heapspace to 1GB and both worked like charm. > > I've added a TODO to my list to test also with more data. > > > > > > > > > > On Sun, Jun 22, 2014 at 3:55 PM, Stephan Ewen wrote: > > > > > There was a patch for deadlocks on broadcast variables a few days ago= . > > > > > > Can you try the current master branch (0.6-SNAPSHOT) and see if that > > solves > > > your problem? > > > > > > --001a11c2db647d332f04fc6e3de3--