Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4B0AF186C4 for ; Tue, 19 Jan 2016 20:27:14 +0000 (UTC) Received: (qmail 50249 invoked by uid 500); 19 Jan 2016 20:27:14 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 50164 invoked by uid 500); 19 Jan 2016 20:27:14 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 50154 invoked by uid 99); 19 Jan 2016 20:27:14 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jan 2016 20:27:14 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 96B05C0F1A for ; Tue, 19 Jan 2016 20:27:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.001 X-Spam-Level: *** X-Spam-Status: No, score=3.001 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id t5AZBOgKGCLh for ; Tue, 19 Jan 2016 20:27:01 +0000 (UTC) Received: from mail-ig0-f175.google.com (mail-ig0-f175.google.com [209.85.213.175]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 14D0142ACA for ; Tue, 19 Jan 2016 20:27:01 +0000 (UTC) Received: by mail-ig0-f175.google.com with SMTP id h5so74406127igh.0 for ; Tue, 19 Jan 2016 12:27:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-type; bh=czm11jckY2Tgj0hbuGL0MSVIquGsp/JUsaUUCU58/PU=; b=OSnHoK3Ramj982DMGgVDYCO4vctOI+UcUtN+Z8HLe2TeE6pqex6OdJP4t5/gnjPgcI mdqq/EOI75lnCnchagYvkI9CjZvgnA2cltfb0NkQA0mZIwETVBj3qZmqZJ5w1TNSz0F3 SFHEQxz6ykkQ0nHtujmGJ2Lk2UxYIdZFTNlhrXAnxLpCkT9D4Hm+U1o93q0bQ469OR0B 7yq9HiQYLJIuI2MrzSs76sYOFhw65dh5rOFXYB2e/INQCQ9RBouasd9zZMJMUQzXZYRv Q1/umFi0iAp2+g/2k5XsetH5g06TNR5lrjVHWE/08Ky9tY6QKS0kUGYe4dZL6KCjsVR7 GaRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=czm11jckY2Tgj0hbuGL0MSVIquGsp/JUsaUUCU58/PU=; b=LZZh2UH1ANsyTP4Z10ho/sBQ7kbrME7bLxLd3PdaKNzJ6lqampX7noEDWgSrC6bilb 2+cQiyGAueS+jIU02lNr0tkA1JXOTEy+ssRqWCmS8ESwZLDPvFrwpHDz07gZ04Wk0bSF 2QO6HL/LCjR4m41hhO1GexdD8ksWyHSq2F68vQqmxuuByZ6WreX7Lm3CUZZU6hBw+mXt VJXPLaC+8wESHpIt+5KE0UCPXtcu+wJSKFWFrvYc9UIsT+1OINnxr7qZzuC0PRWg/0Xw 7s1Mss3JP4K/sIj0VG9WN8W2q7L2Ea78Gn4HWDhjzPaP0jTrZ+4uVzmNsmL5TLjqa0lE Wr6g== X-Gm-Message-State: AG10YOShgCFajtNczHC7k3XdjdAaJX/gWpTis3Yx2XYcyNHorT+xXtnOT/rd3HCnU/oFcUdoa6moqIU0lsbkPA== MIME-Version: 1.0 X-Received: by 10.50.164.131 with SMTP id yq3mr6774529igb.71.1453235211095; Tue, 19 Jan 2016 12:26:51 -0800 (PST) Sender: ewenstephan@gmail.com Received: by 10.107.159.194 with HTTP; Tue, 19 Jan 2016 12:26:51 -0800 (PST) In-Reply-To: References: Date: Tue, 19 Jan 2016 21:26:51 +0100 X-Google-Sender-Auth: PzkXLc-JOQf6ZIPODCukjiSIE4s Message-ID: Subject: Re: JDBCInputFormat GC overhead limit exceeded error From: Stephan Ewen To: user@flink.apache.org Content-Type: multipart/alternative; boundary=089e0122f6e0885c560529b5b092 --089e0122f6e0885c560529b5b092 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi! This kind of error (GC overhead exceeded) usually means that the system is reaching a state where it has very many still living objects and frees little memory during each collection. As a consequence, it is basically busy with only garbage collection. Your job probably has about 500-600 MB or free memory, the rest is at that memory size reserved for JVM overhead and Flink's worker memory. Now, since your job actually does not keep any objects or rows around, this should be plenty. I can only suspect that the Oracle JDBC driver is very memory hungry, thus pushing the system to the limit. (I found this, for example What you can do: For this kind of job, you can simply tell Flink to not reserve as much memory, by using the option "taskmanager.memory.size=3D1". If the JDBC driv= er has no leak, but is simply super hungry, this should solve it. Greetings, Stephan I also found these resources concerning Oracle JDBC memory: - http://stackoverflow.com/questions/2876895/oracle-t4cpreparedstatement-memo= ry-leaks (bottom answer) - https://community.oracle.com/thread/2220078?tstart=3D0 On Tue, Jan 19, 2016 at 5:44 PM, Maximilian Bode < maximilian.bode@tngtech.com> wrote: > Hi Robert, > > I am using 0.10.1. > > > Am 19.01.2016 um 17:42 schrieb Robert Metzger : > > Hi Max, > > which version of Flink are you using? > > On Tue, Jan 19, 2016 at 5:35 PM, Maximilian Bode < > maximilian.bode@tngtech.com> wrote: > >> Hi everyone, >> >> I am facing a problem using the JDBCInputFormat which occurred in a >> larger Flink job. As a minimal example I can reproduce it when just writ= ing >> data into a csv after having read it from a database, i.e. >> >> DataSet> existingData =3D env.createInput( >> JDBCInputFormat.buildJDBCInputFormat() >> .setDrivername("oracle.jdbc.driver.OracleDriver") >> .setUsername(=E2=80=A6) >> .setPassword(=E2=80=A6) >> .setDBUrl(=E2=80=A6) >> .setQuery("select DATA from TABLENAME") >> .finish(), >> new TupleTypeInfo<>(Tuple1.class, BasicTypeInfo.STRING_TYPE_INFO)); >> existingData.writeAsCsv(=E2=80=A6); >> >> where DATA is a column containing strings of length ~25 characters and >> TABLENAME contains 20 million rows. >> >> After starting the job on a YARN cluster (using -tm 3072 and leaving the >> other memory settings at default values), Flink happily goes along at fi= rst >> but then fails after something like three million records have been sent= by >> the JDBCInputFormat. The Exception reads "The slot in which the task was >> executed has been released. Probably loss of TaskManager =E2=80=A6". The= local >> taskmanager.log in the affected container reads >> "java.lang.OutOfMemoryError: GC overhead limit exceeded >> at >> java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1= 063) >> >> at org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectTimeou= t(NioClientBoss.java:119) >> at >> org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.j= ava:83) >> at >> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSe= lector.java:312) >> at >> org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:= 42) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.jav= a:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja= va:615) >> at java.lang.Thread.run(Thread.java:744)" >> >> Any ideas what is going wrong here? >> >> Cheers, >> Max >> >> =E2=80=94 >> Maximilian Bode * Junior Consultant * maximilian.bode@tngtech.com >> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterf=C3=B6hring >> Gesch=C3=A4ftsf=C3=BChrer: Henrik Klagges, Christoph Stock, Dr. Robert D= ahlke >> Sitz: Unterf=C3=B6hring * Amtsgericht M=C3=BCnchen * HRB 135082 >> >> > > --089e0122f6e0885c560529b5b092 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi!

This kind of error (GC overhead exc= eeded) usually means that the system is reaching a state where it has very = many still living objects and frees little memory during each collection. A= s a consequence, it is basically busy with only garbage collection.

Your job probably has about 500-600 MB or free memory, th= e rest is at that memory size reserved for JVM overhead and Flink's wor= ker memory.
Now, since your job actually does not keep any object= s or rows around, this should be plenty. I can only suspect that the Oracle= JDBC driver is very memory hungry, thus pushing the system to the limit. (= I found this, for example=C2=A0

What you can do:
= =C2=A0For this kind of job, you can simply tell Flink to not reserve as muc= h memory, by using the option "taskmanager.memory.size=3D1". If t= he JDBC driver has no leak, but is simply super hungry, this should solve i= t.

Greetings,
Stephan


I also found these resources concerning Oracle JDBC memory:
<= div>

On Tue, Jan 19, 2016 at 5:44 PM, Maximilian Bode <maximilian.bode@tngtech.com> wrote:
Hi Robert,

I= am using 0.10.1.


Am 19.01.2016 um 17:42 schrieb Robe= rt Metzger <rme= tzger@apache.org>:

Hi Max,

which version of Flink are you using?

On Tue, Jan 19, 2016 at 5:35 PM, M= aximilian Bode <maximilian.bode@tngtech.com> wrote= :
Hi = everyone,

I am facing a problem using the JDBCInputForma= t which occurred in a larger Flink job. As a minimal example I can reproduc= e it when just writing data into a csv after having read it from a database= , i.e.

DataSet<Tuple1<String>> existin= gData =3D env.createInput(
= JDBCInputFormat.buildJDBCInputFormat()
.setDrivername("oracle.jdbc.driver.OracleDr= iver")
.setUse= rname(=E2=80=A6)
.s= etPassword(=E2=80=A6)
.setDBUrl(=E2=80=A6)
.setQuery("select DATA from TABLENAME")
.finish(),
new TupleTypeInfo<>(Tuple1.class, BasicTyp= eInfo.STRING_TYPE_INFO));
existingData.writeAsCsv(=E2=80=A6);

where DATA is a column containing strings of length ~= 25 characters and TABLENAME contains 20 million rows.

<= div>After starting the job on a YARN cluster (using -tm 3072 and leaving th= e other memory settings at default values), Flink happily goes along at fir= st but then fails after something like three million records have been sent= by the JDBCInputFormat. The Exception reads "The slot in which the ta= sk was executed has been released. Probably loss of TaskManager =E2=80=A6&q= uot;. The local taskmanager.log in the affected container reads
&= quot;java.lang.OutOfMemoryError: GC overhead limit exceeded
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 at java.util.Collections$UnmodifiableCollection.iterator(= Collections.java:1063)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at=C2=A0org.jboss.net= ty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.jav= a:119)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.jboss.netty.channel.socket.nio= .NioClientBoss.process(NioClientBoss.java:83)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractN= ioSelector.java:312)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.jboss.netty.chan= nel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPo= olExecutor.java:1145)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concurren= t.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
=C2=A0= =C2=A0 =C2=A0 =C2=A0 at java.lang.Thread.run(Thread.java:744)"
<= div>
Any ideas what is going wrong here?

=
Cheers,
Max

=E2=80=94=C2=A0<= /div>
maximilian.bode@tngtech.com
TNG Techn= ology Consulting GmbH, Betastr. 13a, 85774 Unterf=C3=B6hring
Gesc= h=C3=A4ftsf=C3=BChrer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterf=C3=B6hring * Amtsgericht M=C3=BCnchen * HRB 135082




--089e0122f6e0885c560529b5b092--