Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 372FC2009E8 for ; Mon, 30 May 2016 11:34:10 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 35DE5160A19; Mon, 30 May 2016 09:34:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 072D7160A16 for ; Mon, 30 May 2016 11:34:08 +0200 (CEST) Received: (qmail 9454 invoked by uid 500); 30 May 2016 09:34:08 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 9444 invoked by uid 99); 30 May 2016 09:34:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 May 2016 09:34:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id A47481A118C for ; Mon, 30 May 2016 09:34:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 5.38 X-Spam-Level: ***** X-Spam-Status: No, score=5.38 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, URIBL_SBL=4, URIBL_SBL_A=0.1] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=okkam-it.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id IyXPhmlrFPqo for ; Mon, 30 May 2016 09:34:05 +0000 (UTC) Received: from mail-lf0-f48.google.com (mail-lf0-f48.google.com [209.85.215.48]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 812225F245 for ; Mon, 30 May 2016 09:34:04 +0000 (UTC) Received: by mail-lf0-f48.google.com with SMTP id w16so57689184lfd.2 for ; Mon, 30 May 2016 02:34:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=okkam-it.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=ufYW4mLz2GC0H1vRRc1zDx61AuTZuQ/+41ajKnSfgN4=; b=SIZmF4q1RmPfCn509WGlokHo6gOKKy+0FWZHXjeNajZIQbs/XCvsWB0rrxwNRA9x9B HJs0OOWITtGOmY95nvN1AKb3zyvfuUC0jh8cJyFrDyboNtGKquyokiU76JS5DGEJT7qX vQ720Am2zQeh7WIi3k0iiOuEfxsQgYM1eiFFtlcQHlRmdW29iBeDtsLubSQuZG7lNauQ CRHQs2hBstB7mnd0xFdKlWd6tpti9rFVh2FTrm8odFO+NVsTk6BasuCYKeO6k+e/WkY5 7Uomkn6gIpCu4zu60mgEd8N3UN1XCwl6RVIqjKT1De7KHBUGIYhgeBugk0bite412+Y/ IQ7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=ufYW4mLz2GC0H1vRRc1zDx61AuTZuQ/+41ajKnSfgN4=; b=cPYTkmCZyGulCsGh5eDDCnoxyzeYkssRTjxJJaMR1u5LKa87ZR6R/Pd7Fru4xIwIwc ZsCHC7BrVUBc9OqvM/N7SpDf3vGTj9PLIvZ4EgCtSWJFssqZHL7FWW8Ko/KY+ZNKNx7t x3iiQ8cpIxS88QGTkyqpPJfmzJ3TpoMuMFMklPxwWErXA9iSJApBAkQYLBjDRJieo9jp WtKUKYznqzhSQKRUGkBZXY1nmhXvs/Zopi5r9j5axoQsBnWoFeAI7IsE6zO/CcQAMnby UbmPekqEQWNZOfBuqqJ8KH78dxTJa4aSkzoOMdgzjzl9xDBwAGwBUvsypwu8qM3g45aC yGQw== X-Gm-Message-State: ALyK8tI4gtSGS2rsfhdvi4sSRc9refMwORA/p0GDg46QY2RIDr+PBoA4y41P4HxEyU49tAEGLrujW8k6Gn3rsA== X-Received: by 10.46.71.194 with SMTP id u185mr5249684lja.17.1464600842529; Mon, 30 May 2016 02:34:02 -0700 (PDT) MIME-Version: 1.0 Received: by 10.114.48.66 with HTTP; Mon, 30 May 2016 02:33:43 -0700 (PDT) X-Originating-IP: [213.203.177.29] In-Reply-To: References: From: Flavio Pompermaier Date: Mon, 30 May 2016 11:33:43 +0200 Message-ID: Subject: Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet) To: user Content-Type: multipart/alternative; boundary=001a114070def5021405340bf438 archived-at: Mon, 30 May 2016 09:34:10 -0000 --001a114070def5021405340bf438 Content-Type: text/plain; charset=UTF-8 I tried to reproduce the error on a subset of the data and actually reducing the available memory and increasing a lot the gc (creating a lot of useless objects in one of the first UDFs) caused this error: Caused by: java.io.IOException: Thread 'SortMerger spilling thread' terminated due to an exception: / by zero at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:800) Caused by: java.lang.ArithmeticException: / by zero at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$SpillingThread.getSegmentsForReaders(UnilateralSortMerger.java:1651) at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$SpillingThread.mergeChannelList(UnilateralSortMerger.java:1565) at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$SpillingThread.go(UnilateralSortMerger.java:1417) at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:796) I hope this could help to restrict the debugging area :) Best, Flavio On Fri, May 27, 2016 at 8:21 PM, Stephan Ewen wrote: > Hi! > > That is a pretty thing indeed :-) Will try to look into this in a few > days... > > Stephan > > > On Fri, May 27, 2016 at 12:10 PM, Flavio Pompermaier > wrote: > >> Running the job with log level set to DEBUG made the job run >> successfully...Is this meaningful..? Maybe slowing down a little bit the >> threads could help serialization? >> >> >> On Thu, May 26, 2016 at 12:34 PM, Flavio Pompermaier < >> pompermaier@okkam.it> wrote: >> >>> Still not able to reproduce the error locally but remotly :) >>> Any suggestions about how to try to reproduce it locally on a subset of >>> the data? >>> This time I had: >>> >>> com.esotericsoftware.kryo.KryoException: Unable to find class: ^Z^A >>> at >>> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138) >>> at >>> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115) >>> at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:641) >>> at >>> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:752) >>> at >>> org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:228) >>> at >>> org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:431) >>> at >>> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55) >>> at >>> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:124) >>> at >>> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:65) >>> at >>> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) >>> at >>> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) >>> at >>> org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96) >>> at >>> org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:480) >>> at >>> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:345) >>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) >>> at java.lang.Thread.run(Thread.java:745) >>> >>> Best, >>> Flavio >>> >>> >>> On Tue, May 24, 2016 at 5:47 PM, Flavio Pompermaier < >>> pompermaier@okkam.it> wrote: >>> >>>> Do you have any suggestion about how to reproduce the error on a subset >>>> of the data? >>>> I'm trying changing the following but I can't find a configuration >>>> causing the error :( >>>> >>>> rivate static ExecutionEnvironment getLocalExecutionEnv() { >>>> org.apache.flink.configuration.Configuration c = new >>>> org.apache.flink.configuration.Configuration(); >>>> c.setString(ConfigConstants.TASK_MANAGER_TMP_DIR_KEY, "/tmp"); >>>> c.setString(ConfigConstants.BLOB_STORAGE_DIRECTORY_KEY,"/tmp"); >>>> c.setFloat(ConfigConstants.TASK_MANAGER_MEMORY_FRACTION_KEY, >>>> 0.9f); >>>> c.setLong(ConfigConstants.LOCAL_NUMBER_TASK_MANAGER, 4); >>>> c.setLong(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, 4); >>>> c.setString(ConfigConstants.AKKA_ASK_TIMEOUT, "10000 s"); >>>> c.setLong(ConfigConstants.TASK_MANAGER_NETWORK_NUM_BUFFERS_KEY, >>>> 2048 * 12); >>>> ExecutionEnvironment env = >>>> ExecutionEnvironment.createLocalEnvironment(c); >>>> env.setParallelism(16); >>>> env.registerTypeWithKryoSerializer(DateTime.class, >>>> JodaDateTimeSerializer.class ); >>>> return env; >>>> } >>>> >>>> Best, >>>> Flavio >>>> >>>> >>>> On Tue, May 24, 2016 at 11:13 AM, Till Rohrmann >>>> wrote: >>>> >>>>> The error look really strange. Flavio, could you compile a test >>>>> program with example data and configuration to reproduce the problem. Given >>>>> that, we could try to debug the problem. >>>>> >>>>> Cheers, >>>>> Till >>>>> >>>> >>>> >>> >> > --001a114070def5021405340bf438 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I tried to reproduce the error on a subset of the data and= actually reducing the available memory and increasing a lot the gc (creati= ng a lot of useless objects in one of the first UDFs) caused this error:
Caused by: java.io.IOException: Thread 'SortMerge= r spilling thread' terminated due to an exception: / by zero
= at org.apache.flink.runtime.op= erators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:= 800)
Caused by: java.lang.ArithmeticException: / by zero
at org.apache.flink.runtime.= operators.sort.UnilateralSortMerger$SpillingThread.getSegmentsForReaders(Un= ilateralSortMerger.java:1651)
at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$S= pillingThread.mergeChannelList(UnilateralSortMerger.java:1565)
at org.apache.flink.runtime.oper= ators.sort.UnilateralSortMerger$SpillingThread.go(UnilateralSortMerger.java= :1417)
at org.apache= .flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(Unilatera= lSortMerger.java:796)

I hope this could help= to restrict the debugging area :)

Best,
Flavio

On F= ri, May 27, 2016 at 8:21 PM, Stephan Ewen <sewen@apache.org> = wrote:
Hi!

That is a pretty thing indeed :-) Will try to look into this in a few= days...

Stephan

<= br>
On Fri, May 27, 2016 at 12:10 PM, Flavio Pomp= ermaier <pompermaier@okkam.it> wrote:
Running the job with log level set to DEBUG= made the job run successfully...Is this meaningful..? Maybe slowing down a= little bit the threads could help serialization?


On Thu, May 26, 2016 at 12:= 34 PM, Flavio Pompermaier <pompermaier@okkam.it> wrote:
Still not able to reprodu= ce the error locally but remotly :)
Any suggestions about how to try to= reproduce it locally on a subset of the data?
This time I had:
=

com.esotericsoftware.kryo.KryoException: Unable to= find class: ^Z^A
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.esoter= icsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.jav= a:138)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.esotericsoftware.kryo.u= til.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.esotericsoftware.kryo.Kryo.readClass(Kr= yo.java:641)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.esotericsoftware.= kryo.Kryo.readClassAndObject(Kryo.java:752)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at org.apache.flink.api.java.typeutils.runtime.kryo.Kr= yoSerializer.deserialize(KryoSerializer.java:228)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at org.apache.flink.api.java.typeutils.runtime.PojoSerializer= .deserialize(PojoSerializer.java:431)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at org.apache.flink.runtime.plugable.NonReusingDeserializatio= nDelegate.read(NonReusingDeserializationDelegate.java:55)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at org.apache.flink.runtime.io.network.api.serializati= on.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiv= eSpanningRecordDeserializer.java:124)
=C2=A0 =C2=A0 =C2=A0 =C2=A0= at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.get= NextRecord(AbstractRecordReader.java:65)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.n= ext(MutableRecordReader.java:34)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at o= rg.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.j= ava:73)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.flink.ru= ntime.operators.MapDriver.run(MapDriver.java:96)
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 at org.apache.flink.runtime.operators.BatchTask.run(Batch= Task.java:480)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.flink.ru= ntime.operators.BatchTask.invoke(BatchTask.java:345)
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 at org.apache.flink.runtime.taskmanager.Task.run(Task.jav= a:559)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.lang.Thread.run(Thread= .java:745)

Best,
Flavio

On Tue, Ma= y 24, 2016 at 5:47 PM, Flavio Pompermaier <pompermaier@okkam.it>= wrote:
Do you have any suggestion about how to reproduce the error on a subs= et of the data?
I'm trying changing the following but I can&#= 39;t find a configuration causing the error :(

rivate static Executi= onEnvironment getLocalExecutionEnv() {
=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0= =C2=A0 org.apache.flink.configuration.Configuration c =3D new org.apache.fl= ink.configuration.Configuration();
=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0= c.setString(ConfigConstants.TASK_MANAGER_TMP_DIR_KEY, "/tmp");=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 c.setString(ConfigConstants.BLOB_ST= ORAGE_DIRECTORY_KEY,"/tmp");
=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0= =C2=A0 c.setFloat(ConfigConstants.TASK_MANAGER_MEMORY_FRACTION_KEY, 0.9f);<= br>=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 c.setLong(ConfigConstants.LOCAL_NU= MBER_TASK_MANAGER, 4);
=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 c.setLong(C= onfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, 4);
=C2=A0=C2=A0=C2=A0 =C2= =A0=C2=A0=C2=A0 c.setString(ConfigConstants.AKKA_ASK_TIMEOUT, "10000 s= ");
=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 c.setLong(ConfigConstants= .TASK_MANAGER_NETWORK_NUM_BUFFERS_KEY, 2048 * 12);
=C2=A0=C2=A0=C2=A0 = =C2=A0=C2=A0=C2=A0 ExecutionEnvironment env =3D ExecutionEnvironment.create= LocalEnvironment(c);
=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 env.setParall= elism(16);
=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 env.registerTypeWithKry= oSerializer(DateTime.class, JodaDateTimeSerializer.class );
=C2=A0=C2=A0= =C2=A0 =C2=A0=C2=A0=C2=A0 return env;
=C2=A0=C2=A0=C2=A0 }

= Best,
Flavio


On Tue, May 24, 2016 at 11:13 AM, Till Rohrmann <troh= rmann@apache.org> wrote:
The error look really strange. Flavio, could you compile a t= est program with example data and configuration to reproduce the problem. G= iven that, we could try to debug the problem.

Cheers,
Till





--001a114070def5021405340bf438--