Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CD07E17742 for ; Mon, 10 Nov 2014 17:00:26 +0000 (UTC) Received: (qmail 63646 invoked by uid 500); 10 Nov 2014 17:00:26 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 63579 invoked by uid 500); 10 Nov 2014 17:00:26 -0000 Mailing-List: contact user-help@flink.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.incubator.apache.org Delivered-To: mailing list user@flink.incubator.apache.org Received: (qmail 63568 invoked by uid 99); 10 Nov 2014 17:00:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Nov 2014 17:00:26 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,MIME_QP_LONG_LINE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [85.13.129.7] (HELO dd2236.kasserver.com) (85.13.129.7) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Nov 2014 16:59:59 +0000 Received: from [192.168.0.32] (unknown [95.91.208.176]) by dd2236.kasserver.com (Postfix) with ESMTPSA id 05D3C4AA019C for ; Mon, 10 Nov 2014 17:59:57 +0100 (CET) User-Agent: Microsoft-MacOutlook/14.3.4.130416 Date: Mon, 10 Nov 2014 17:59:51 +0100 Subject: Re: How to make Flink to write less temporary files? From: Malte Schwarzer To: Message-ID: Thread-Topic: How to make Flink to write less temporary files? In-Reply-To: Mime-version: 1.0 Content-type: multipart/alternative; boundary="B_3498487198_12220240" X-Virus-Checked: Checked by ClamAV on apache.org > This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. --B_3498487198_12220240 Content-type: text/plain; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable What's the estimated amount of disk space for such a job? Or how can I calculate it? Malte Von: Stephan Ewen Antworten an: Datum: Montag, 10. November 2014 11:22 An: Betreff: Re: How to make Flink to write less temporary files? Hi! With 10 nodes and 25 GB on each node, you have 250 GB space to spill temporary files. You also seem to have roughly the same size in JVM Heap, out of which Flink can use roughly 2/3. When you process 1 TB, 250 GB JVM heap and 250 GB temp file space may not b= e enough, it is less than the initial data size. I think you need simply need more disk space for a job like that... Stephan On Mon, Nov 10, 2014 at 10:54 AM, Malte Schwarzer wrote: > My blobStore fileds are small, but each *.channel file is around 170MB. B= efore > I start by Flink job I=B9ve 25GB free space available in my tmp-dir and my > taskmanager heap size is currently at 24GB. I=B9m using a cluster with 10 n= odes. >=20 > Is this enough space to process a 1TB file? >=20 > Von: Stephan Ewen > Antworten an: > Datum: Montag, 10. November 2014 10:35 > An: > Betreff: Re: How to make Flink to write less temporary files? >=20 >=20 > I would assume that the blobStore fields are rather small (they are only = jar > files so far). >=20 > I would look for *.channel files, which are spilled intermediate results.= They > can get pretty large for large jobs. --B_3498487198_12220240 Content-type: text/html; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable
What's the estimated amount o= f disk space for such a job? Or how can I calculate it?

=
Malte

Von: Step= han Ewen <sewen@apache.org>
<= span style=3D"font-weight:bold">Antworten an: <user@flink.incubator.apache.org>
Datum: Montag, 10. November 2014 11:22An: <user@flink.incubator.apache.org>
Betreff: Re: How to make Flink to write less temp= orary files?

Hi!

W= ith 10 nodes and 25 GB on each node, you have 250 GB space to spill temporar= y files. You also seem to have roughly the same size in JVM Heap, out of whi= ch Flink can use roughly 2/3.

When you process 1 TB= , 250 GB JVM heap and 250 GB temp file space may not be enough, it is less t= han the initial data size.

I think you need simply = need more disk space for a job like that...

Stephan=




On Mon, Nov 10, 2014 at 10:54 AM, Malte S= chwarzer <ms@= mieo.de> wrote:
My blobStore fileds are small, but each *.channel file is around 17= 0MB. Before I start by Flink job I’ve 25GB free space available in my = tmp-dir and my taskmanager heap size is currently at 24GB. I’m using a= cluster with 10 nodes.

Is this enough space to pro= cess a 1TB file?

Von: Stephan Ewen <sewen@apache.org>
Antworten an: <user@flink.incubator.apache.org>
Datum: Montag, 10. November 2014 10:35
An: <user@flink.incubator.apache.org>
Betreff: Re: How to make Flink to write = less temporary files?

I = would assume that the blobStore fields are rather small (they are only jar f= iles so far).

I would look for *.channel files, which are sp= illed intermediate results. They can get pretty large for large jobs.


--B_3498487198_12220240--