From dev-return-14338-archive-asf-public=cust-asf.ponee.io@impala.apache.org  Fri Jan 19 17:49:04 2018
Return-Path: <dev-return-14338-archive-asf-public=cust-asf.ponee.io@impala.apache.org>
X-Original-To: archive-asf-public@eu.ponee.io
Delivered-To: archive-asf-public@eu.ponee.io
Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183])
	by mx-eu-01.ponee.io (Postfix) with ESMTP id 36E37180607
	for <archive-asf-public@eu.ponee.io>; Fri, 19 Jan 2018 17:49:04 +0100 (CET)
Received: by cust-asf.ponee.io (Postfix)
	id 26DB4160C28; Fri, 19 Jan 2018 16:49:04 +0000 (UTC)
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by cust-asf.ponee.io (Postfix) with SMTP id 1C399160C27
	for <archive-asf-public@cust-asf.ponee.io>; Fri, 19 Jan 2018 17:49:02 +0100 (CET)
Received: (qmail 30096 invoked by uid 500); 19 Jan 2018 16:49:02 -0000
Mailing-List: contact dev-help@impala.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:dev-help@impala.apache.org>
List-Unsubscribe: <mailto:dev-unsubscribe@impala.apache.org>
List-Post: <mailto:dev@impala.apache.org>
List-Id: <dev.impala.apache.org>
Reply-To: dev@impala.apache.org
Delivered-To: mailing list dev@impala.apache.org
Received: (qmail 30084 invoked by uid 99); 19 Jan 2018 16:49:01 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Jan 2018 16:49:01 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 8133A180891
	for <dev@impala.apache.org>; Fri, 19 Jan 2018 16:49:01 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 1.898
X-Spam-Level: *
X-Spam-Status: No, score=1.898 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001,
	SPF_PASS=-0.001] autolearn=disabled
Authentication-Results: spamd3-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=cloudera.com
Received: from mx1-lw-eu.apache.org ([10.40.0.8])
	by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024)
	with ESMTP id mu6zwE_rKzdo for <dev@impala.apache.org>;
	Fri, 19 Jan 2018 16:48:57 +0000 (UTC)
Received: from mail-oi0-f47.google.com (mail-oi0-f47.google.com [209.85.218.47])
	by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id D26E95F250
	for <dev@impala.apache.org>; Fri, 19 Jan 2018 16:48:56 +0000 (UTC)
Received: by mail-oi0-f47.google.com with SMTP id j129so1523243oib.12
        for <dev@impala.apache.org>; Fri, 19 Jan 2018 08:48:56 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cloudera.com; s=google;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :cc;
        bh=MHRm7IhKoWoBWmKLPIT5m1qEg0jnX9nn0X/BHYgIrgU=;
        b=jY73dEyTpqb16ntJsMZWNH1DoqCy3yQDKxUK+t25yTF3aWJn9E8jel5TvU0OFDpIQa
         PzyJLorM3AoumEv2em3J7S86oc7CTu8Ut5zfyA/vkmNi4dtsv3kIsq53bKRnQrG+ia9E
         t3kH5s1C11M+D9DkMMwgM2I5LihZtorjE+dbe3d7M5whi02uGddiyH3RkBJaVi5X86oy
         T3QYjltAampmapWxMdnYmKL5IA0WZBturNP5EyyLEEoDOI0GgFoebun1gvtOlWFkyKfG
         YQNnudbpd9yPBPnkQ7rwHpn2JCW6YCsFpIjcusWnGISxCEziozVNOX40NUu29g6Ps7c+
         y+bQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:in-reply-to:references:from:date
         :message-id:subject:to:cc;
        bh=MHRm7IhKoWoBWmKLPIT5m1qEg0jnX9nn0X/BHYgIrgU=;
        b=CKffwgi+22AVuM0iVLj/Z0TLorwHeTo1zdk0wdF+hTDzteHqmLiGYPBh62ttBpWrtq
         A6HYv2QjMaD6aABnFKZVoHK2pLD0VP/7YOJIroIDwBoEcu7XhR3J0WQ83ZAr6+NiK9pL
         q1+rBQCitDKs1czSehCJz7ztWHes4Do9YU+SWVR7wp7W6vrpd3iy3l4tMjc1GYUYBVag
         jGXNrIGqF42soJTbsG9OZkj0PB2ugyazN7olm8RSe6WiGRIDymjhHFe9W+Sl5MEGL2Ns
         KDN1c8bU0b1ZdHhc966bMA2Y9vaRL78aRRGCWJr7lkej+XHVChGWEyKKBp6fIqEN5Mwj
         folA==
X-Gm-Message-State: AKwxytdRHZ9SWrrbyHTIM43MEzyLBG2lKSDoFit/HiMlu2oUYfh+FfkR
	EwGU1O/rG05MKLazZ9AhlzWfhk6r6XepShGARRHZ1g==
X-Google-Smtp-Source: AH8x2266yE48sy9j3PIqaR2vahMgVy3EyOOe4tqcU8HINRaKJ+AGY6YYLnvM6L8Fiv/vyeXm9XwKQIiDWaXO/cSpoRo=
X-Received: by 10.202.92.7 with SMTP id q7mr732838oib.17.1516380535153; Fri,
 19 Jan 2018 08:48:55 -0800 (PST)
MIME-Version: 1.0
Received: by 10.202.204.69 with HTTP; Fri, 19 Jan 2018 08:48:54 -0800 (PST)
In-Reply-To: <353155fe.aca3.1610ec758fe.Coremail.huang_quanlong@126.com>
References: <fd63a3e.2813.160fcc45521.Coremail.huang_quanlong@126.com>
 <CAEoRBezg5a0tp+w+vbMu2p0BLnkAChrdoQ8L=6u7BSpRyEbRiA@mail.gmail.com>
 <24269987.21d.1610128d0e2.Coremail.huang_quanlong@126.com>
 <CAEoRBexFbgPqip5oEf7ZYmsmTNOKUfbuoE8shCm-XXEr6EOcsQ@mail.gmail.com>
 <1943568f.41f.16101735b3f.Coremail.huang_quanlong@126.com>
 <CAEoRBew6DFPAVX_g59VstUhk4T7TTMYdD7kNsBtfQo0ipO6+Sw@mail.gmail.com>
 <7e911d13.c17.161019b8358.Coremail.huang_quanlong@126.com> <353155fe.aca3.1610ec758fe.Coremail.huang_quanlong@126.com>
From: Alexander Behm <alex.behm@cloudera.com>
Date: Fri, 19 Jan 2018 08:48:54 -0800
Message-ID: <CAMvjXi_EjFGq1Xkn+y7_YyDnTaHpV0pRt=yRjuVb=PqTTtGesQ@mail.gmail.com>
Subject: Re: Re:Re: Re: Re: Cancellation logic in HdfsScanners
To: dev@impala.apache.org
Cc: Tim Armstrong <tarmstrong@cloudera.com>
Content-Type: multipart/alternative; boundary="001a113d536824a161056323db65"

--001a113d536824a161056323db65
Content-Type: text/plain; charset="UTF-8"

Very nice! Thank you.

On Fri, Jan 19, 2018 at 6:16 AM, Quanlong Huang <huang_quanlong@126.com>
wrote:

> Hi Tim, I believe it's a bug and find out a way to reproduce it.
> Have filed a JIRA: https://issues.apache.org/jira/browse/IMPALA-6423
>
> At 2018-01-17 08:53:44, "Quanlong Huang" <huang_quanlong@126.com> wrote:
> >Thanks, Tim! Let me try to reproduce this scenario on existing scanners.
> I'll file a JIRA when I find it.
> >
> >At 2018-01-17 08:39:46, "Tim Armstrong" <tarmstrong@cloudera.com> wrote:
> >>I think there is still probably a bug in the existing scanners where they
> >>can ignore cancellation under specific conditions.
> >>
> >>> For non-MT scanners, why don't they just check about
> RuntimeState::is_cancelled()?
> >>Are there any reasons that they should go ahead until
> HdfsScanNode::done()?
> >>I think the non-MT scanners should check both
> RuntimeState::is_cancelled()
> >>and HdfsScanNode::done(), since they signal different termination
> >>conditions.
> >>
> >>On Tue, Jan 16, 2018 at 4:09 PM, Quanlong Huang <huang_quanlong@126.com>
> >>wrote:
> >>
> >>> I'm developing the hdfs orc scanner (IMPALA-5717) and encountered such
> >>> scenario in test_failpoints.py. The existing scanners can pass this
> test. I
> >>> think this might be my own problem so I haven't filed a JIRA yet.
> >>>
> >>> Just want to confirm that when setting MT_DOP=0, other scanners won't
> get
> >>> into this scenario. For non-MT scanners, why don't they just check
> >>> about RuntimeState::is_cancelled()? Are there any reasons that they
> >>> should go ahead until HdfsScanNode::done()?
> >>>
> >>> At 2018-01-17 07:00:51, "Tim Armstrong" <tarmstrong@cloudera.com>
> wrote:
> >>>
> >>> Looks to me like you found a bug. I think the scanners should be
> checking
> >>> both cancellation conditions, i.e. RuntimeState::is_cancelled_ for MT
> and
> >>> non-MT scanners and hdfs_scan_node::done_ for non-MT scanners.
> >>>
> >>> On Tue, Jan 16, 2018 at 2:48 PM, Quanlong Huang <
> huang_quanlong@126.com>
> >>> wrote:
> >>>
> >>>> Hi Tim,
> >>>>
> >>>> Thanks for your reply! I have a further question. When given MT_DOP=0,
> >>>> why don't we use RuntimeState::is_cancelled() to detect cancellation
> in
> >>>> hdfs scanners? For example, use it in the loop of ProcessSplit.
> >>>> There might be a scenario that the FragementInstance was canceled, but
> >>>> the scanner still don't know about it and then go ahead and pass up
> all the
> >>>> row batches. If the FragementInstance just consists of an
> HdfsScanNode, the
> >>>> DataStreamSender will try to send these row batches to the upstream
> >>>> FragmentInstance which has been cancelled. Apparently it'll fail but
> it
> >>>> will retry for 2 minutes (in default). The memory resources kept by
> the
> >>>> DataStreamSender cannot be released in this 2 minutes window, which
> might
> >>>> cause other queries in parallel raising MemLimitExceeded error.
> >>>>
> >>>> For example, the plan of query "select 1 from alltypessmall a join
> alltypessmall b on a.id != b.id" is
> >>>> +-----------------------------------------------------------
> -------------------------+
> >>>> | Max Per-Host Resource Reservation: Memory=0B
>                |
> >>>> | Per-Host Resource Estimates: Memory=2.06GB
>                |
> >>>> | WARNING: The following tables are missing relevant table and/or
> column statistics. |
> >>>> | functional_orc.alltypessmall
>                |
> >>>> |
>                 |
> >>>> | F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
>                 |
> >>>> | Per-Host Resources: mem-estimate=0B mem-reservation=0B
>                |
> >>>> |   PLAN-ROOT SINK
>                |
> >>>> |   |  mem-estimate=0B mem-reservation=0B
>                 |
> >>>> |   |
>                 |
> >>>> |   04:EXCHANGE [UNPARTITIONED]
>                 |
> >>>> |      mem-estimate=0B mem-reservation=0B
>                 |
> >>>> |      tuple-ids=0,1 row-size=8B cardinality=unavailable
>                |
> >>>> |
>                 |
> >>>> | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
>                |
> >>>> | Per-Host Resources: mem-estimate=2.03GB mem-reservation=0B
>                |
> >>>> |   DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=04, UNPARTITIONED]
>                |
> >>>> |   |  mem-estimate=0B mem-reservation=0B
>                 |
> >>>> |   02:NESTED LOOP JOIN [INNER JOIN, BROADCAST]
>                 |
> >>>> |   |  predicates: a.id != b.id
>                 |
> >>>> |   |  mem-estimate=2.00GB mem-reservation=0B
>                 |
> >>>> |   |  tuple-ids=0,1 row-size=8B cardinality=unavailable
>                |
> >>>> |   |
>                 |
> >>>> |   |--03:EXCHANGE [BROADCAST]
>                |
> >>>> |   |     mem-estimate=0B mem-reservation=0B
>                |
> >>>> |   |     tuple-ids=1 row-size=4B cardinality=unavailable
>                 |
> >>>> |   |
>                 |
> >>>> |   00:SCAN HDFS [functional_orc.alltypessmall a, RANDOM]
>                 |
> >>>> |      partitions=4/4 files=4 size=4.82KB
>                 |
> >>>> |      stored statistics:
>                 |
> >>>> |        table: rows=unavailable size=unavailable
>                 |
> >>>> |        partitions: 0/4 rows=unavailable
>                 |
> >>>> |        columns: unavailable
>                 |
> >>>> |      extrapolated-rows=disabled
>                 |
> >>>> |      mem-estimate=32.00MB mem-reservation=0B
>                |
> >>>> |      tuple-ids=0 row-size=4B cardinality=unavailable
>                |
> >>>> |
>                 |
> >>>> | F01:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
>                |
> >>>> | Per-Host Resources: mem-estimate=32.00MB mem-reservation=0B
>                 |
> >>>> |   DATASTREAM SINK [FRAGMENT=F00, EXCHANGE=03, BROADCAST]
>                |
> >>>> |   |  mem-estimate=0B mem-reservation=0B
>                 |
> >>>> |   01:SCAN HDFS [functional_orc.alltypessmall b, RANDOM]
>                 |
> >>>> |      partitions=4/4 files=4 size=4.82KB
>                 |
> >>>> |      stored statistics:
>                 |
> >>>> |        table: rows=unavailable size=unavailable
>                 |
> >>>> |        partitions: 0/4 rows=unavailable
>                 |
> >>>> |        columns: unavailable
>                 |
> >>>> |      extrapolated-rows=disabled
>                 |
> >>>> |      mem-estimate=32.00MB mem-reservation=0B
>                |
> >>>> |      tuple-ids=1 row-size=4B cardinality=unavailable
>                |
> >>>> +-----------------------------------------------------------
> -------------------------+
> >>>>
> >>>> When errors happen in F00, cancellation rpc will be sent to F01.
> However, the hdfs scanner in F01 does not notice it in time and pass up all
> the row batches. Then the DataStreamSender will try to send these row
> batches to F01. It will retry for 2 minutes. In this time window it might
> hold significant memory resources, which causes other queries cannot
> allocate memory and fail. This can be avoid if the hdfs scanner use
> RuntimeState::is_cancelled() to detect the cancellation in time.
> >>>>
> >>>> Am I right?
> >>>>
> >>>> Thanks,
> >>>> Quanlong
> >>>>
> >>>> At 2018-01-17 01:05:57, "Tim Armstrong" <tarmstrong@cloudera.com>
> wrote:
> >>>> >ScannerContext::cancelled() == true means that the scan has
> completed,
> >>>> >either because it has returned enough rows, because the query is
> cancelled,
> >>>> >or because it hit an error.
> >>>> >
> >>>> >RuntimeState::cancelled() == true means that the query is cancelled.
> >>>> >
> >>>> >So there are cases where ScannerContext::cancelled() == true and
> >>>> >RuntimeState::cancelled() is false. E.g. where there's a limit on
> the scan.
> >>>> >
> >>>> >I think the name of ScannerContext::cancelled() is misleading, it
> should
> >>>> >probably be called "done()" to match HdfsScanNode::done(). More
> generally,
> >>>> >the cancellation logic could probably be cleaned up and simplified
> further.
> >>>> >
> >>>> >On Mon, Jan 15, 2018 at 6:20 PM, Quanlong Huang <
> huang_quanlong@126.com>
> >>>> >wrote:
> >>>> >
> >>>> >> Hi all,
> >>>> >>
> >>>> >>
> >>>> >> I'm confused about the cancellation logic in hdfs scanners.
> There're two
> >>>> >> functions to detect cancellation: ScannerContext::cancelled() and
> >>>> >> RuntimeState::is_cancelled().
> >>>> >> When MT_DOP is not set (i.e. MT_DOP=0),
> ScannerContext::cancelled() will
> >>>> >> return HdfsScanNode::done(). However, the field done_ in
> HdfsScanNode seems
> >>>> >> to be set according to status return from scanners.
> >>>> >> I've witnessed some points when RuntimeState::is_cancelled() is
> true but
> >>>> >> ScannerContext::cancelled() is false.
> >>>> >>
> >>>> >>
> >>>> >> My question is why scanners don't use RuntimeState::is_cancelled()
> to
> >>>> >> detect cancellation, which is more timely than using
> >>>> >> ScannerContext::cancelled(). There must be some detailed reasons
> that I've
> >>>> >> missed. Would you be so kind to answer my question?
> >>>> >>
> >>>> >>
> >>>> >> Thanks,
> >>>> >> Quanlong
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>>
>

--001a113d536824a161056323db65--