From user-return-33281-archive-asf-public=cust-asf.ponee.io@flink.apache.org Fri Mar 6 11:27:48 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id AF75918057A for ; Fri, 6 Mar 2020 12:27:47 +0100 (CET) Received: (qmail 26353 invoked by uid 500); 6 Mar 2020 11:27:46 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 26343 invoked by uid 99); 6 Mar 2020 11:27:46 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Mar 2020 11:27:46 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 755A0181399 for ; Fri, 6 Mar 2020 11:27:45 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.25 X-Spam-Level: X-Spam-Status: No, score=0.25 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id IpQBdysrZzCm for ; Fri, 6 Mar 2020 11:27:44 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a00:1450:4864:20::241; helo=mail-lj1-x241.google.com; envelope-from=jingsonglee0@gmail.com; receiver= Received: from mail-lj1-x241.google.com (mail-lj1-x241.google.com [IPv6:2a00:1450:4864:20::241]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 9BB7F7DD39 for ; Fri, 6 Mar 2020 11:27:43 +0000 (UTC) Received: by mail-lj1-x241.google.com with SMTP id f10so1825036ljn.6 for ; Fri, 06 Mar 2020 03:27:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=l8TAvS3aznEea5WlftrIRY00dhS1TXudUEZzMGSn4Vc=; b=ErRNbUvGWca3oUl+jUHDgjeN9SB0hGGCy11rXa7OxGAOlSV+CNy5aN3TMUPSyVgkWs LBBnt3KWFgywHI6uZFMrgho1/uIUaoZ3QZy4EoHEB+HlLb2hr1XfU+FSG95XeQ8XYQgh 6MTcbMe3vmnzujBv4hW0hsylzpErY0fLSxXmzKxffk0yOd1Hr1fvAuw9VqV7+hfCnhW9 ISY1iytRLyhIKKiIH2XDBNFlzeyFbxfrKAL3eDJTV9Kz8rGhVZhuxVd4ZL7AJSYLCeDV hxUtHykWE2eZzehQ57Z/YGuAS52pvUfj+pwjBgnZs0vH0d/FYUbZQaJz1FmF3vlctI1s ofww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=l8TAvS3aznEea5WlftrIRY00dhS1TXudUEZzMGSn4Vc=; b=nV4BdtYRAiRI2/PVegP1D1AhQVvscFD37ymr2uVV41dYzyEQQtBdyz3mpCCrNLlxo3 sdfthpm1a0aDabWctyAZol/f2RLP+gmPWfMjjHEjjD+7Qqo+RHyBQLP/VA8i4KG1VNhk mWN5AqW7ptAFR/0I/6peOlkLpOrWSd5O1Bw06pvwXgEVoSMfS8EYifOJmVikcrYRohWT 4orJ4rvXtmeDBYGACs8vxGx3o4Wjm7Mdc1iH3IP9fEBPryQWvmA7+lTnAmTtonyYkLWM JKFOd1YY7u09kuTUAzk0wVq4ThLGQATrlxEWCgZdki/M3lSV9gTBxA3iLxHaOGLtDT2r TG1g== X-Gm-Message-State: ANhLgQ1/8kq4o5Ge/lJsGBXZZ+Nm56FAgDJEWHBLSDOvOqBfTsqTL8B4 gmg1ezRKAoBsloYb7kBPMbiMqugtqQKnm7Ab5Fk= X-Google-Smtp-Source: ADFU+vtB2Z0q6bdNgg1FgJiIONnfq5R6eefdS5vYVEMCo0buMr082GrWRB0anV8zkFeGwkhBsao/b8rPJTG+h5+MB8k= X-Received: by 2002:a2e:145e:: with SMTP id 30mr1754373lju.25.1583494062828; Fri, 06 Mar 2020 03:27:42 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Jingsong Li Date: Fri, 6 Mar 2020 19:27:31 +0800 Message-ID: Subject: Re: The parallelism of sink is always 1 in sqlUpdate To: faaron zheng Cc: user Content-Type: multipart/alternative; boundary="0000000000001e70c305a02df1ee" --0000000000001e70c305a02df1ee Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Which sink do you use? It depends on sink implementation like [1] [1] https://github.com/apache/flink/blob/2b13a4155fd4284f6092decba867e71eea0580= 43/flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/t= able/sinks/CsvTableSink.java#L147 Best, Jingsong Lee On Fri, Mar 6, 2020 at 6:37 PM faaron zheng wrote: > Thanks for you attention. The input of sink is 500, and there is no orde= r > by and limit. > > Jingsong Li =E4=BA=8E 2020=E5=B9=B43=E6=9C=886= =E6=97=A5=E5=91=A8=E4=BA=94 =E4=B8=8B=E5=8D=886:15=E5=86=99=E9=81=93=EF=BC= =9A > >> Hi faaron, >> >> For sink parallelism. >> - What is parallelism of the input of sink? The sink parallelism should >> be same. >> - Does you sql have order by or limit ? >> Flink batch sql not support range partition now, so it will use single >> parallelism to run order by. >> >> For the memory of taskmanager. >> There is manage memory option to configure. >> >> [1] >> https://ci.apache.org/projects/flink/flink-docs-master/ops/memory/mem_se= tup.html#managed-memory >> >> Best, >> Jingsong Lee >> >> On Fri, Mar 6, 2020 at 5:38 PM faaron zheng >> wrote: >> >>> Hi all, >>> >>> I am trying to use flink sql to run hive task. I use tEnv.sqlUpdate to >>> execute my sql which looks like "insert overtwrite ... select ...". But= I >>> find the parallelism of sink is always 1, it's intolerable for large da= ta. >>> Why it happens? Otherwise, Is there any guide to decide the memory of >>> taskmanager when I have two huge table to hashjoin, for example, each t= able >>> has several TB data? >>> >>> Thanks, >>> Faaron >>> >> >> >> -- >> Best, Jingsong Lee >> > --=20 Best, Jingsong Lee --0000000000001e70c305a02df1ee Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Which sink do you use?
It depends on sink implementati= on=C2=A0like [1]


<= /div>
Best,
Jingsong Lee

On Fri, Mar 6, 2020 at 6:37 PM = faaron zheng <faaronzheng@gmail= .com> wrote:
Thanks for you attention.=C2=A0 The input of sin= k is 500, and there is no order by and limit.

Jingsong Li <jingsonglee0@gmail.com> =E4=BA=8E 2020=E5=B9=B43=E6=9C=886=E6=97= =A5=E5=91=A8=E4=BA=94 =E4=B8=8B=E5=8D=886:15=E5=86=99=E9=81=93=EF=BC=9A
=
Hi= faaron,

For sink=C2=A0parallelism.
- What is = parallelism of the input of sink? The sink=C2=A0parallelism should be same.=
- Does you sql have order by or limit ?
Flink batch sq= l not support range partition now, so it will use single parallelism to run= order by.

For the memory of taskmanager.
There is manage memory option to configure.


Best,
Jingsong= Lee

On Fri, Mar 6, 2020 at 5:38 PM faaron zheng <faaronzheng@gm= ail.com> wrote:
Hi all,

I am trying to use flink sql to run hive task. I use tEnv.sqlUpdate to e= xecute my sql which looks like "insert overtwrite ... select ..."= . But I find the parallelism of sink is always 1, it's intolerable for = large data. Why it happens? Otherwise, Is there any guide to decide the me= mory of taskmanager when I have two huge table to hashjoin, for example, ea= ch table has several TB data?

Thanks,
Faaron


--
Best, Jingsong Lee


--
Best, Jingsong Lee
--0000000000001e70c305a02df1ee--