Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 71AB6188CA for ; Fri, 13 Nov 2015 08:53:07 +0000 (UTC) Received: (qmail 66538 invoked by uid 500); 13 Nov 2015 08:53:07 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 66458 invoked by uid 500); 13 Nov 2015 08:53:07 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 66448 invoked by uid 99); 13 Nov 2015 08:53:07 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Nov 2015 08:53:07 +0000 Received: from mail-lb0-f179.google.com (mail-lb0-f179.google.com [209.85.217.179]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 8118C1A038E for ; Fri, 13 Nov 2015 08:53:06 +0000 (UTC) Received: by lbbsy6 with SMTP id sy6so21241622lbb.2 for ; Fri, 13 Nov 2015 00:53:04 -0800 (PST) X-Received: by 10.112.141.201 with SMTP id rq9mr9863149lbb.4.1447404784721; Fri, 13 Nov 2015 00:53:04 -0800 (PST) MIME-Version: 1.0 Received: by 10.112.173.196 with HTTP; Fri, 13 Nov 2015 00:52:45 -0800 (PST) In-Reply-To: References: From: Robert Metzger Date: Fri, 13 Nov 2015 09:52:45 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Join Stream with big ref table To: "user@flink.apache.org" Content-Type: multipart/alternative; boundary=001a11c38b5a0a2bc305246830b6 --001a11c38b5a0a2bc305246830b6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Arnaud, I'm happy that you were able to resolve the issue. If you are still interested in the first approach, you could try some things, for example using only one slot per task manager (the slots share the heap of the TM). Regards, Robert On Fri, Nov 13, 2015 at 9:18 AM, LINZ, Arnaud wrote: > Hello, > > > > I=E2=80=99ve worked around my problem by not using the HiveServer2 JDBC d= river to > read the ref table. Apparently, despite all the good options passed to th= e > Statement object, it poorly handles RAM, since converting the table into > textformat and directly reading the hdfs works without any problem and wi= th > a lot of free mem=E2=80=A6 > > > > Greetings, > > Arnaud > > > > *De :* LINZ, Arnaud > *Envoy=C3=A9 :* jeudi 12 novembre 2015 17:48 > *=C3=80 :* 'user@flink.apache.org' > *Objet :* Join Stream with big ref table > > > > Hello, > > > > I have to enrich a stream with a big reference table (11,000,000 rows). I > cannot use =E2=80=9Cjoin=E2=80=9D because I cannot window the stream ; so= in the =E2=80=9Copen()=E2=80=9D > function of each mapper I read the content of the table and put it in a > HashMap (stored on the heap). > > > > 11M rows is quite big but it should take less than 100Mb in RAM, so it=E2= =80=99s > supposed to be easy. However, I systematically run into a Java Out Of > Memory error, even with huge 64Gb containers (5 slots / container). > > > > Path, ID > > Data Port > > Last Heartbeat > > All Slots > > Free Slots > > CPU Cores > > Physical Memory > > Free Memory > > Flink Managed Memory > > akka.tcp://flink@172.21.125.28:43653/user/taskmanager > > 4B4D0A725451E933C39E891AAE80B53B > > 41982 > > 2015-11-12, 17:46:14 > > 5 > > 5 > > 32 > > 126.0 GB > > 46.0 GB > > 31.5 GB > > > > I don=E2=80=99t clearly understand why this happens and how to fix it. An= y clue? > > > > > > > > ------------------------------ > > L'int=C3=A9grit=C3=A9 de ce message n'=C3=A9tant pas assur=C3=A9e sur int= ernet, la soci=C3=A9t=C3=A9 > exp=C3=A9ditrice ne peut =C3=AAtre tenue responsable de son contenu ni de= ses pi=C3=A8ces > jointes. Toute utilisation ou diffusion non autoris=C3=A9e est interdite.= Si > vous n'=C3=AAtes pas destinataire de ce message, merci de le d=C3=A9truir= e et > d'avertir l'exp=C3=A9diteur. > > The integrity of this message cannot be guaranteed on the Internet. The > company that sent this message cannot therefore be held liable for its > content nor attachments. Any unauthorized use or dissemination is > prohibited. If you are not the intended recipient of this message, then > please delete it and notify the sender. > --001a11c38b5a0a2bc305246830b6 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Arnaud,

I'm happy that you were = able to resolve the issue. If you are still interested in the first approac= h, you could try some things, for example using only one slot per task mana= ger (the slots share the heap of the TM).

Regards,=
Robert

On Fri, Nov 13, 2015 at 9:18 AM, LINZ, Arnaud <ALINZ@bo= uyguestelecom.fr> wrote:

Hello,

=C2=A0

I=E2=80=99ve worked ar= ound my problem by not using the HiveServer2 JDBC driver to read the ref ta= ble. Apparently, despite all the good options passed to the Statement object, it poorly handles RAM, si= nce converting the table into textformat and directly reading the hdfs work= s without any problem and with a lot of free mem=E2=80=A6

=C2=A0

Greetings,

Arnaud

=C2=A0

De=C2=A0: LINZ, Arnaud
Envoy=C3=A9=C2=A0: jeudi
12 novembre 2015 17:48
=C3=80=C2=A0: 'user@flink.apache.org' <user@flink.apache.org>
Objet=C2=A0: Join Stream with big ref table

=C2=A0

Hello,

=C2=A0

I have to enrich a str= eam with a big reference table (11,000,000 rows). I cannot use =E2=80=9Cjoi= n=E2=80=9D because I cannot window the stream ; so in the =E2=80=9Copen()= =E2=80=9D function of each mapper I read the content of the table and put it in a Ha= shMap (stored on the heap).

=C2=A0

11M rows is quite big = but it should take less than 100Mb in RAM, so it=E2=80=99s supposed to be e= asy. However, I systematically run into a Java Out Of Memory error, even with huge 64Gb containers (5 slots / container).=

=C2=A0

Path= , ID

Data= Port

Last= Heartbeat

All = Slots

Free= Slots

CPU = Cores

Phys= ical Memory

Free= Memory

Flin= k Managed Memory

akka= .tcp://flink@172.21.125.28:43653/user/taskmanager

4B4D0A725451E933C39E891AAE80B53B<= /p>

4198= 2

2015= -11-12, 17:46:14

5=

5=

32

126.= 0 GB

46.0= GB

31.5= GB

=C2=A0

I don=E2=80=99t clearl= y understand why this happens and how to fix it. Any clue?

=C2=A0

=C2=A0

=C2=A0




L'int=C3=A9grit=C3=A9 de ce message n'=C3=A9tant pas assur=C3=A9e s= ur internet, la soci=C3=A9t=C3=A9 exp=C3=A9ditrice ne peut =C3=AAtre tenue = responsable de son contenu ni de ses pi=C3=A8ces jointes. Toute utilisation= ou diffusion non autoris=C3=A9e est interdite. Si vous n'=C3=AAtes pas= destinataire de ce message, merci de le d=C3=A9truire et d'avertir l'exp=C3=A9diteur.

The integrity of this message cannot be guaranteed on the Internet. The com= pany that sent this message cannot therefore be held liable for its content= nor attachments. Any unauthorized use or dissemination is prohibited. If y= ou are not the intended recipient of this message, then please delete it and notify the sender.

--001a11c38b5a0a2bc305246830b6--