From user-return-26432-archive-asf-public=cust-asf.ponee.io@flink.apache.org Mon Mar 11 12:42:15 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 37636180657 for ; Mon, 11 Mar 2019 13:42:15 +0100 (CET) Received: (qmail 9354 invoked by uid 500); 11 Mar 2019 12:42:13 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 9344 invoked by uid 99); 11 Mar 2019 12:42:13 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Mar 2019 12:42:13 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 761BBC2394 for ; Mon, 11 Mar 2019 12:42:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.778 X-Spam-Level: * X-Spam-Status: No, score=1.778 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 9PB_75Z2F0Sx for ; Mon, 11 Mar 2019 12:42:12 +0000 (UTC) Received: from mail-pg1-f196.google.com (mail-pg1-f196.google.com [209.85.215.196]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id EB85961141 for ; Mon, 11 Mar 2019 12:42:11 +0000 (UTC) Received: by mail-pg1-f196.google.com with SMTP id b2so3940172pgl.9 for ; Mon, 11 Mar 2019 05:42:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:mime-version:subject:message-id:date:to; bh=Funplro6M4gPf0xvSYDXk79O1/RofpOXb/gVHZvOaUQ=; b=BGzEqG1/4xDYutVXh5h05E7KstE2c8pqf2KgodTOiYTX//e22O0gIqsKLCYKJstr9i lGx+N4FijHFLcJOkETFgfUkwFGsMDr43ScYj7rtZRHHD/Es0pXBF7WazAnlfMX9ULImK f1716ISwiLgs2EOwkxvlK04swkCJ01ooERw3qp967C60kREc2zjTPXBRXG7gOeR9/L85 0r85zpRKlnKlGgMHleH872JatLc/TiEPyWco1mh4mqnND+h3Yj+6r/rL8MxegqK4gLde 711gIvnkFqmJouK20dfUS/HSX64B89acXUiQPr8wDx+Bkrt/44jJ5OdJhAMiGrr6Bpsb PT1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:mime-version:subject:message-id:date:to; bh=Funplro6M4gPf0xvSYDXk79O1/RofpOXb/gVHZvOaUQ=; b=PMOZklqiXFr8XOKBPzUzljXOYeuUv/u2cW3CchoxoBDQQRFVluUT8xYdhTt7jXgXwD VvY93IceIXc2T4P0LNwkMQiyaBCTPYOEus69NUnBcsslyWX13nuusv4yoTw5mS2Op6e2 r0sFP0GM7ATk6+JM/EY4qItIVErNIhU77M0RFypH3GVkYre3bM6rq9XwRNvNYRY3XL8D 497unFMpDz3u5GpPNqDeMuRbeIXGxJabuskXQ10t+AUYLMnsxgZcJS3jLUuKd3Wbzj8h 1C68qsXOfBUDZcBiRbKTV7a/aEJbhnC0vEXGPhPdNfwgpiLnuqqPnsEugPMnxsTOVM0Q 1Qrg== X-Gm-Message-State: APjAAAUXJLEExvLzCXEXUd14LPRhgezYWGINIYHrO7yspAG+EGq+cC82 BBIs7/i1FJtWVzNXuLqDOGEAIxkq X-Google-Smtp-Source: APXvYqwM96kFk99ng9fqMrkw29SvX58pJ7bNAGvobZawsmkeseb0sekgDXYoaeyffRILZ357ZkeWmg== X-Received: by 2002:a62:ed0c:: with SMTP id u12mr33949989pfh.88.1552308130720; Mon, 11 Mar 2019 05:42:10 -0700 (PDT) Received: from [10.2.213.90] ([61.120.150.73]) by smtp.gmail.com with ESMTPSA id u67sm8741784pfu.51.2019.03.11.05.42.08 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 11 Mar 2019 05:42:10 -0700 (PDT) From: qi luo Content-Type: multipart/alternative; boundary="Apple-Mail=_15801DDB-F7B8-4FE6-BC46-FE611DAF0C5B" Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Set partition number of Flink DataSet Message-Id: <7A8F13E5-F089-4DC8-AA55-459C6A0EC20F@gmail.com> Date: Mon, 11 Mar 2019 20:42:06 +0800 To: user X-Mailer: Apple Mail (2.3445.102.3) --Apple-Mail=_15801DDB-F7B8-4FE6-BC46-FE611DAF0C5B Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi, We=E2=80=99re trying to distribute batch input data to (N) HDFS files = partitioning by hash using DataSet API. What I=E2=80=99m doing is like: env.createInput(=E2=80=A6) .partitionByHash(0) .setParallelism(N) .output(=E2=80=A6) This works well for small number of files. But when we need to = distribute to large number of files (say 100K), the parallelism becomes = too large and we could not afford that many TMs. In spark we can write something like =E2=80=98rdd.partitionBy(N)=E2=80=99 = and control the parallelism separately (using dynamic allocation). Is = there anything similar in Flink or other way we can achieve similar = result? Thank you! Qi= --Apple-Mail=_15801DDB-F7B8-4FE6-BC46-FE611DAF0C5B Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Hi,

We=E2=80= =99re trying to distribute batch input data to (N) HDFS files = partitioning by hash using DataSet API. What I=E2=80=99m doing is = like:

env.createInput(=E2=80=A6)
      = .partitionByHash(0)
      .setParallelism(N)
      = .output(=E2=80=A6)

This works well for small number of files. But when we need = to distribute to large number of files (say = 100K), the parallelism becomes too large and we could not afford = that many TMs.

In spark we can write something like =E2=80=98rdd.partitionBy(N= )=E2=80=99 and control the parallelism separately (using dynamic = allocation). Is there anything similar in Flink or other way we can = achieve similar result? Thank you!

Qi
= --Apple-Mail=_15801DDB-F7B8-4FE6-BC46-FE611DAF0C5B--