From dev-return-14338-archive-asf-public=cust-asf.ponee.io@impala.apache.org Fri Jan 19 17:49:04 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 36E37180607 for ; Fri, 19 Jan 2018 17:49:04 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 26DB4160C28; Fri, 19 Jan 2018 16:49:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1C399160C27 for ; Fri, 19 Jan 2018 17:49:02 +0100 (CET) Received: (qmail 30096 invoked by uid 500); 19 Jan 2018 16:49:02 -0000 Mailing-List: contact dev-help@impala.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@impala.apache.org Delivered-To: mailing list dev@impala.apache.org Received: (qmail 30084 invoked by uid 99); 19 Jan 2018 16:49:01 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Jan 2018 16:49:01 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 8133A180891 for ; Fri, 19 Jan 2018 16:49:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.898 X-Spam-Level: * X-Spam-Status: No, score=1.898 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=cloudera.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id mu6zwE_rKzdo for ; Fri, 19 Jan 2018 16:48:57 +0000 (UTC) Received: from mail-oi0-f47.google.com (mail-oi0-f47.google.com [209.85.218.47]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id D26E95F250 for ; Fri, 19 Jan 2018 16:48:56 +0000 (UTC) Received: by mail-oi0-f47.google.com with SMTP id j129so1523243oib.12 for ; Fri, 19 Jan 2018 08:48:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudera.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=MHRm7IhKoWoBWmKLPIT5m1qEg0jnX9nn0X/BHYgIrgU=; b=jY73dEyTpqb16ntJsMZWNH1DoqCy3yQDKxUK+t25yTF3aWJn9E8jel5TvU0OFDpIQa PzyJLorM3AoumEv2em3J7S86oc7CTu8Ut5zfyA/vkmNi4dtsv3kIsq53bKRnQrG+ia9E t3kH5s1C11M+D9DkMMwgM2I5LihZtorjE+dbe3d7M5whi02uGddiyH3RkBJaVi5X86oy T3QYjltAampmapWxMdnYmKL5IA0WZBturNP5EyyLEEoDOI0GgFoebun1gvtOlWFkyKfG YQNnudbpd9yPBPnkQ7rwHpn2JCW6YCsFpIjcusWnGISxCEziozVNOX40NUu29g6Ps7c+ y+bQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=MHRm7IhKoWoBWmKLPIT5m1qEg0jnX9nn0X/BHYgIrgU=; b=CKffwgi+22AVuM0iVLj/Z0TLorwHeTo1zdk0wdF+hTDzteHqmLiGYPBh62ttBpWrtq A6HYv2QjMaD6aABnFKZVoHK2pLD0VP/7YOJIroIDwBoEcu7XhR3J0WQ83ZAr6+NiK9pL q1+rBQCitDKs1czSehCJz7ztWHes4Do9YU+SWVR7wp7W6vrpd3iy3l4tMjc1GYUYBVag jGXNrIGqF42soJTbsG9OZkj0PB2ugyazN7olm8RSe6WiGRIDymjhHFe9W+Sl5MEGL2Ns KDN1c8bU0b1ZdHhc966bMA2Y9vaRL78aRRGCWJr7lkej+XHVChGWEyKKBp6fIqEN5Mwj folA== X-Gm-Message-State: AKwxytdRHZ9SWrrbyHTIM43MEzyLBG2lKSDoFit/HiMlu2oUYfh+FfkR EwGU1O/rG05MKLazZ9AhlzWfhk6r6XepShGARRHZ1g== X-Google-Smtp-Source: AH8x2266yE48sy9j3PIqaR2vahMgVy3EyOOe4tqcU8HINRaKJ+AGY6YYLnvM6L8Fiv/vyeXm9XwKQIiDWaXO/cSpoRo= X-Received: by 10.202.92.7 with SMTP id q7mr732838oib.17.1516380535153; Fri, 19 Jan 2018 08:48:55 -0800 (PST) MIME-Version: 1.0 Received: by 10.202.204.69 with HTTP; Fri, 19 Jan 2018 08:48:54 -0800 (PST) In-Reply-To: <353155fe.aca3.1610ec758fe.Coremail.huang_quanlong@126.com> References: <24269987.21d.1610128d0e2.Coremail.huang_quanlong@126.com> <1943568f.41f.16101735b3f.Coremail.huang_quanlong@126.com> <7e911d13.c17.161019b8358.Coremail.huang_quanlong@126.com> <353155fe.aca3.1610ec758fe.Coremail.huang_quanlong@126.com> From: Alexander Behm Date: Fri, 19 Jan 2018 08:48:54 -0800 Message-ID: Subject: Re: Re:Re: Re: Re: Cancellation logic in HdfsScanners To: dev@impala.apache.org Cc: Tim Armstrong Content-Type: multipart/alternative; boundary="001a113d536824a161056323db65" --001a113d536824a161056323db65 Content-Type: text/plain; charset="UTF-8" Very nice! Thank you. On Fri, Jan 19, 2018 at 6:16 AM, Quanlong Huang wrote: > Hi Tim, I believe it's a bug and find out a way to reproduce it. > Have filed a JIRA: https://issues.apache.org/jira/browse/IMPALA-6423 > > At 2018-01-17 08:53:44, "Quanlong Huang" wrote: > >Thanks, Tim! Let me try to reproduce this scenario on existing scanners. > I'll file a JIRA when I find it. > > > >At 2018-01-17 08:39:46, "Tim Armstrong" wrote: > >>I think there is still probably a bug in the existing scanners where they > >>can ignore cancellation under specific conditions. > >> > >>> For non-MT scanners, why don't they just check about > RuntimeState::is_cancelled()? > >>Are there any reasons that they should go ahead until > HdfsScanNode::done()? > >>I think the non-MT scanners should check both > RuntimeState::is_cancelled() > >>and HdfsScanNode::done(), since they signal different termination > >>conditions. > >> > >>On Tue, Jan 16, 2018 at 4:09 PM, Quanlong Huang > >>wrote: > >> > >>> I'm developing the hdfs orc scanner (IMPALA-5717) and encountered such > >>> scenario in test_failpoints.py. The existing scanners can pass this > test. I > >>> think this might be my own problem so I haven't filed a JIRA yet. > >>> > >>> Just want to confirm that when setting MT_DOP=0, other scanners won't > get > >>> into this scenario. For non-MT scanners, why don't they just check > >>> about RuntimeState::is_cancelled()? Are there any reasons that they > >>> should go ahead until HdfsScanNode::done()? > >>> > >>> At 2018-01-17 07:00:51, "Tim Armstrong" > wrote: > >>> > >>> Looks to me like you found a bug. I think the scanners should be > checking > >>> both cancellation conditions, i.e. RuntimeState::is_cancelled_ for MT > and > >>> non-MT scanners and hdfs_scan_node::done_ for non-MT scanners. > >>> > >>> On Tue, Jan 16, 2018 at 2:48 PM, Quanlong Huang < > huang_quanlong@126.com> > >>> wrote: > >>> > >>>> Hi Tim, > >>>> > >>>> Thanks for your reply! I have a further question. When given MT_DOP=0, > >>>> why don't we use RuntimeState::is_cancelled() to detect cancellation > in > >>>> hdfs scanners? For example, use it in the loop of ProcessSplit. > >>>> There might be a scenario that the FragementInstance was canceled, but > >>>> the scanner still don't know about it and then go ahead and pass up > all the > >>>> row batches. If the FragementInstance just consists of an > HdfsScanNode, the > >>>> DataStreamSender will try to send these row batches to the upstream > >>>> FragmentInstance which has been cancelled. Apparently it'll fail but > it > >>>> will retry for 2 minutes (in default). The memory resources kept by > the > >>>> DataStreamSender cannot be released in this 2 minutes window, which > might > >>>> cause other queries in parallel raising MemLimitExceeded error. > >>>> > >>>> For example, the plan of query "select 1 from alltypessmall a join > alltypessmall b on a.id != b.id" is > >>>> +----------------------------------------------------------- > -------------------------+ > >>>> | Max Per-Host Resource Reservation: Memory=0B > | > >>>> | Per-Host Resource Estimates: Memory=2.06GB > | > >>>> | WARNING: The following tables are missing relevant table and/or > column statistics. | > >>>> | functional_orc.alltypessmall > | > >>>> | > | > >>>> | F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 > | > >>>> | Per-Host Resources: mem-estimate=0B mem-reservation=0B > | > >>>> | PLAN-ROOT SINK > | > >>>> | | mem-estimate=0B mem-reservation=0B > | > >>>> | | > | > >>>> | 04:EXCHANGE [UNPARTITIONED] > | > >>>> | mem-estimate=0B mem-reservation=0B > | > >>>> | tuple-ids=0,1 row-size=8B cardinality=unavailable > | > >>>> | > | > >>>> | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3 > | > >>>> | Per-Host Resources: mem-estimate=2.03GB mem-reservation=0B > | > >>>> | DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=04, UNPARTITIONED] > | > >>>> | | mem-estimate=0B mem-reservation=0B > | > >>>> | 02:NESTED LOOP JOIN [INNER JOIN, BROADCAST] > | > >>>> | | predicates: a.id != b.id > | > >>>> | | mem-estimate=2.00GB mem-reservation=0B > | > >>>> | | tuple-ids=0,1 row-size=8B cardinality=unavailable > | > >>>> | | > | > >>>> | |--03:EXCHANGE [BROADCAST] > | > >>>> | | mem-estimate=0B mem-reservation=0B > | > >>>> | | tuple-ids=1 row-size=4B cardinality=unavailable > | > >>>> | | > | > >>>> | 00:SCAN HDFS [functional_orc.alltypessmall a, RANDOM] > | > >>>> | partitions=4/4 files=4 size=4.82KB > | > >>>> | stored statistics: > | > >>>> | table: rows=unavailable size=unavailable > | > >>>> | partitions: 0/4 rows=unavailable > | > >>>> | columns: unavailable > | > >>>> | extrapolated-rows=disabled > | > >>>> | mem-estimate=32.00MB mem-reservation=0B > | > >>>> | tuple-ids=0 row-size=4B cardinality=unavailable > | > >>>> | > | > >>>> | F01:PLAN FRAGMENT [RANDOM] hosts=3 instances=3 > | > >>>> | Per-Host Resources: mem-estimate=32.00MB mem-reservation=0B > | > >>>> | DATASTREAM SINK [FRAGMENT=F00, EXCHANGE=03, BROADCAST] > | > >>>> | | mem-estimate=0B mem-reservation=0B > | > >>>> | 01:SCAN HDFS [functional_orc.alltypessmall b, RANDOM] > | > >>>> | partitions=4/4 files=4 size=4.82KB > | > >>>> | stored statistics: > | > >>>> | table: rows=unavailable size=unavailable > | > >>>> | partitions: 0/4 rows=unavailable > | > >>>> | columns: unavailable > | > >>>> | extrapolated-rows=disabled > | > >>>> | mem-estimate=32.00MB mem-reservation=0B > | > >>>> | tuple-ids=1 row-size=4B cardinality=unavailable > | > >>>> +----------------------------------------------------------- > -------------------------+ > >>>> > >>>> When errors happen in F00, cancellation rpc will be sent to F01. > However, the hdfs scanner in F01 does not notice it in time and pass up all > the row batches. Then the DataStreamSender will try to send these row > batches to F01. It will retry for 2 minutes. In this time window it might > hold significant memory resources, which causes other queries cannot > allocate memory and fail. This can be avoid if the hdfs scanner use > RuntimeState::is_cancelled() to detect the cancellation in time. > >>>> > >>>> Am I right? > >>>> > >>>> Thanks, > >>>> Quanlong > >>>> > >>>> At 2018-01-17 01:05:57, "Tim Armstrong" > wrote: > >>>> >ScannerContext::cancelled() == true means that the scan has > completed, > >>>> >either because it has returned enough rows, because the query is > cancelled, > >>>> >or because it hit an error. > >>>> > > >>>> >RuntimeState::cancelled() == true means that the query is cancelled. > >>>> > > >>>> >So there are cases where ScannerContext::cancelled() == true and > >>>> >RuntimeState::cancelled() is false. E.g. where there's a limit on > the scan. > >>>> > > >>>> >I think the name of ScannerContext::cancelled() is misleading, it > should > >>>> >probably be called "done()" to match HdfsScanNode::done(). More > generally, > >>>> >the cancellation logic could probably be cleaned up and simplified > further. > >>>> > > >>>> >On Mon, Jan 15, 2018 at 6:20 PM, Quanlong Huang < > huang_quanlong@126.com> > >>>> >wrote: > >>>> > > >>>> >> Hi all, > >>>> >> > >>>> >> > >>>> >> I'm confused about the cancellation logic in hdfs scanners. > There're two > >>>> >> functions to detect cancellation: ScannerContext::cancelled() and > >>>> >> RuntimeState::is_cancelled(). > >>>> >> When MT_DOP is not set (i.e. MT_DOP=0), > ScannerContext::cancelled() will > >>>> >> return HdfsScanNode::done(). However, the field done_ in > HdfsScanNode seems > >>>> >> to be set according to status return from scanners. > >>>> >> I've witnessed some points when RuntimeState::is_cancelled() is > true but > >>>> >> ScannerContext::cancelled() is false. > >>>> >> > >>>> >> > >>>> >> My question is why scanners don't use RuntimeState::is_cancelled() > to > >>>> >> detect cancellation, which is more timely than using > >>>> >> ScannerContext::cancelled(). There must be some detailed reasons > that I've > >>>> >> missed. Would you be so kind to answer my question? > >>>> >> > >>>> >> > >>>> >> Thanks, > >>>> >> Quanlong > >>>> > >>>> > >>>> > >>>> > >>>> > >>> > >>> > >>> > >>> > >>> > --001a113d536824a161056323db65--