Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C60E5200B27 for ; Wed, 22 Jun 2016 14:05:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id C4C2A160A35; Wed, 22 Jun 2016 12:05:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 18633160A2E for ; Wed, 22 Jun 2016 14:05:22 +0200 (CEST) Received: (qmail 91720 invoked by uid 500); 22 Jun 2016 12:05:22 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 91707 invoked by uid 99); 22 Jun 2016 12:05:22 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jun 2016 12:05:22 +0000 Received: from [192.168.1.11] (unknown [147.46.125.39]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 9096A1A00A8 for ; Wed, 22 Jun 2016 12:05:21 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: Code related to spilling data to disk From: Chiwan Park In-Reply-To: <3D5B6F20-F204-461D-A25C-D1AEB54BA9F0@gmail.com> Date: Wed, 22 Jun 2016 21:05:17 +0900 Content-Transfer-Encoding: quoted-printable Message-Id: <33A44DA7-7F29-4770-9971-8544483606DA@apache.org> References: <13E8DEBE-4C5B-42A6-9FF5-2B0F11DD2655@gmail.com> <3D5B6F20-F204-461D-A25C-D1AEB54BA9F0@gmail.com> To: user@flink.apache.org X-Mailer: Apple Mail (2.3124) archived-at: Wed, 22 Jun 2016 12:05:24 -0000 Hi, I=E2=80=99m not sure about the reason to use JVM heap instead of managed = memory, but It seems that the reason is using JVM heap makes development = easier. Maybe Stephan can give exact answer to you. I think managed = memory still has benefit in terms of GC time and memory utilization. The Flink community has a plan [1] to move data structures for streaming = operators to managed memory. [1]: = https://docs.google.com/document/d/1ExmtVpeVVT3TIhO1JoBpC5JKXm-778DAD7eqw5= GANwE/edit# Regards, Chiwan Park > On Jun 22, 2016, at 8:39 PM, Tae-Geon Um wrote: >=20 > Thank you for your answer to my question, Chiwan :) =20 > Can I ask another question? =20 >=20 >=20 >> On Jun 22, 2016, at 7:22 PM, Chiwan Park = wrote: >>=20 >> Hi Tae-Geon, >>=20 >> AFAIK, spilling *data* to disk happens only when managed memory is = used. Currently, streaming API (DataStream) doesn=E2=80=99t use managed = memory yet. `MutableHashTable` is one of representative usage of managed = memory with disk spilling. Note that some special structures such as = `CompactingHashTable` doesn=E2=80=99t spill data to disk even though = they use the manage memory to achieve high performance. >=20 > As far as I understand, spilling data is only performed on batch mode.=20= > Do you know why streaming mode does not use managed memory?=20 > Is this because the performance gain is negligible? >=20 >>=20 >> About spilling *states*, I think that it depends on how state = backends is implemented. For example, `FsStateBackend` saves states to = file system but `MemoryStateBackend` doesn=E2=80=99t. = `RocksDBStateBackend` uses memory first and also can spill states to = disk. >=20 > I=E2=80=99ve found a nice document on the state backend [1]. I will = take a look at this doc to know the detail.=20 > Thanks!=20 >=20 > Taegeon >=20 > [1]: = https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/stat= e_backends.html#state-backends >=20 >>=20 >> Regards, >> Chiwan Park >>=20 >>> On Jun 22, 2016, at 3:27 PM, Tae-Geon Um = wrote: >>>=20 >>> I have another question.=20 >>> Is the spilling only executed on batch mode?=20 >>> What happen on streaming mode? =20 >>>=20 >>>> On Jun 22, 2016, at 1:48 PM, Tae-Geon Um = wrote: >>>>=20 >>>> Hi, all >>>>=20 >>>> As far as I know, Flink spills data (states?) to disk if the data = exceeds memory threshold or there exists memory pressure. >>>> i=E2=80=99d like to know the detail of how Flink spills data to = disk.=20 >>>>=20 >>>> Could you please let me know which codes do I have to investigate?=20= >>>>=20 >>>> Thanks, >>>> Taegeon >>>=20 >>=20