From dev-return-8063-archive-asf-public=cust-asf.ponee.io@beam.apache.org  Sun Feb 18 19:25:17 2018
Return-Path: <dev-return-8063-archive-asf-public=cust-asf.ponee.io@beam.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id A909318064D
	for <archive-asf-public@cust-asf.ponee.io>; Sun, 18 Feb 2018 19:25:16 +0100 (CET)
Received: (qmail 40277 invoked by uid 500); 18 Feb 2018 18:25:15 -0000
Mailing-List: contact dev-help@beam.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:dev-help@beam.apache.org>
List-Unsubscribe: <mailto:dev-unsubscribe@beam.apache.org>
List-Post: <mailto:dev@beam.apache.org>
List-Id: <dev.beam.apache.org>
Reply-To: dev@beam.apache.org
Delivered-To: mailing list dev@beam.apache.org
Received: (qmail 40262 invoked by uid 99); 18 Feb 2018 18:25:14 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 Feb 2018 18:25:14 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 68CCF1800EB
	for <dev@beam.apache.org>; Sun, 18 Feb 2018 18:25:14 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 1.879
X-Spam-Level: *
X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01,
	RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled
Authentication-Results: spamd3-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=gmail.com
Received: from mx1-lw-us.apache.org ([10.40.0.8])
	by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024)
	with ESMTP id FULEq1kf0Pcp for <dev@beam.apache.org>;
	Sun, 18 Feb 2018 18:25:11 +0000 (UTC)
Received: from mail-yb0-f180.google.com (mail-yb0-f180.google.com [209.85.213.180])
	by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 615125F5B3
	for <dev@beam.apache.org>; Sun, 18 Feb 2018 18:25:10 +0000 (UTC)
Received: by mail-yb0-f180.google.com with SMTP id o1-v6so2185482ybk.10
        for <dev@beam.apache.org>; Sun, 18 Feb 2018 10:25:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to;
        bh=2AlpHjYM7OP1cFPEme0FKGfmvY3puwkg/WTreU0tKNo=;
        b=EQGdzc2CU1sIOcEOry2vGOlQMVgi+PFTOcMBb2cWv8qpch8pnJArhqArXb/AINMCCC
         BdYVBfl2zSnIaCaH8m47z5ic6FJmWbr53m18ZTKftsh47yg2SGLlrdGE91BPi/xdNf1m
         sx/UGEPl6oPP7CbePX0+VATB+xMx9+NOiIb3pDdxxZDnO08zp+HJYJJAOTdbTlLE8dg9
         YJb+CMrmrNxyYZZX7a0ueivsI3aQIaCNoULWZRMLAuHXbyKemLOuD1ZMH7GpKeW5e9+d
         oDs+sOhite1mnmTJn6gpLToY44VEJ6Bu/4vs+ISeqs39ai1BKylNhQx2QCAI646+TgYh
         slFg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:in-reply-to:references:from:date
         :message-id:subject:to;
        bh=2AlpHjYM7OP1cFPEme0FKGfmvY3puwkg/WTreU0tKNo=;
        b=ls7tX5+7v0oxKJ9Cd0OVShXSATBOa1/imenP+0nG1zSO+UdB1eV+9xJcyB8EDU9XlS
         mconsDZp0pgafhTzpWPqyZ7xdZx2PbFB9CDXRIIy/k5f2m7ev91zLVmG+8yy7CrK6BIX
         3fzcBJSYPTJtjxDMdC16DtChU0BVfJ06NpUX6bBWOREf1GBBmwc2JU+nKO9MJGaFaE1C
         ASjjlryxkjVCUmIXk588IibThm37gDO9Wpbnp/xcAfCV3nLr8cMTqlUaKyHkKKPMA5kb
         R6LeH2EI9GTyxCoAJaqUmGjMinwhe+r96YD/3/iwwsrPfEK1WjdyDK38xPLFrqqDFX7a
         n9ew==
X-Gm-Message-State: APf1xPCYsTn17qQna/5U/YUGWdOwK5Je670Pe6vks4n6imV4q8NMRC0Z
	7IYAlvTfMBxAYxuRMM/vW62sZ6J8/Lsq23eey7w=
X-Google-Smtp-Source: AH8x225CdFjl2NHX7HnrY4VOqlt6FNGrVQxUUrg8dFCbGdP83n13BIJN46tvli24ZhRnBy+B+qOZ/2fzQTDba9CEfd8=
X-Received: by 10.37.136.135 with SMTP id d7mr8665691ybl.399.1518978309675;
 Sun, 18 Feb 2018 10:25:09 -0800 (PST)
MIME-Version: 1.0
Received: by 2002:a25:30c3:0:0:0:0:0 with HTTP; Sun, 18 Feb 2018 10:24:49
 -0800 (PST)
In-Reply-To: <CAFmTo4-vUPVC9OecrZ5VMsVxL6gEq3cbjffqtDeNaE9VnFfHVw@mail.gmail.com>
References: <CACLE=7OwB22nYatHr70kKb0uzORqLxMgvUjY3KOdENXK6NUJyw@mail.gmail.com>
 <CALsTK6Jr0omhTYss2nq_3Mk-RreM5yRn10hvC5kN_ESo-JmxGw@mail.gmail.com>
 <CACLE=7O2gk6b2tbyy3OR=OwaiMv7fWpJzLw15UY5wnsyuLWs8A@mail.gmail.com>
 <CAN_Ypr0FkrmMzMVSHz_U46SSsd_K79UmfQKVRCwsKSW4aR3pBg@mail.gmail.com>
 <CACLE=7OGN0DBcBJPi+RhCSoq7TpjJi3HcS7HitG-gShdW7wooA@mail.gmail.com>
 <CAN_Ypr1kXBbvpjuuR2i2ZU2UU1divJQ0OBfL8ExKHVTOneYUMA@mail.gmail.com>
 <CACLE=7MHOnv23uRodBUVuNH_7WiA1m0rvk6ypemsR6qm6h8W2Q@mail.gmail.com>
 <CAN_Ypr03cUTvpMHVJZgcgZkbee+XFreRc7DLJq5M3Oh4bd9YOQ@mail.gmail.com>
 <CACLE=7O82dkdFe0rwyQkqzvmj0qyec8sy08wz5AUixoVqhTLgA@mail.gmail.com>
 <CAN_Ypr16eyPXBjL8-DLBr3-3S_xqY9w+bTBTWhSHM3hkHLprBg@mail.gmail.com>
 <CALsTK6JEi2WF5uX-ACfvfieFySF=AWACjQvmXP3z_pee2eZ61g@mail.gmail.com>
 <CACLE=7OBVQHfYJ38iKy8hjD-z2z8h=7f1JDZ9Fpd6TGfF-O1Qg@mail.gmail.com>
 <CALsTK6LTfRamp+MgYdLf025mYbgyPSEjVGFAY1ieJdmO=EHPfA@mail.gmail.com>
 <CACLE=7NemrcyJZM77Yt-E8f71UqjCbz8qAP=CzzuOqMZuaWc9g@mail.gmail.com>
 <02c86b31-70c9-a574-81ca-b1e2a000ac8d@nanthrax.net> <CACLE=7OYWs-qq1X6uZxbpaORCZ5zvapGhduy1TWxtE6RniwOMg@mail.gmail.com>
 <CAFmTo4_rfCLb6GGQLHNuXU5dmU7Ow1t2gJ7SJXKHQ2fkRAtgiA@mail.gmail.com>
 <CACLE=7PJv-JfT-PfOcUjgR_0-9WbmdqZdwHL4ySZY=JQOjnqoQ@mail.gmail.com>
 <CAN_Ypr0bd0LNK1EeAsnwsZX-OVdZotavLO8ya8foZHMsWYB4Rw@mail.gmail.com>
 <CACLE=7M+CyruZ0erHR49H7UfstT1Q9pkCwJC9K04PYFhGzWLzg@mail.gmail.com>
 <CAFmTo49TwrRkArjzJOg0VmEKmV96tJUhxnmvazKMqeT9BKjmYQ@mail.gmail.com>
 <CACLE=7M-TB56egwwreTYzUYm3tMi2J3FX4G=1dv7Tc+fdZ_oCg@mail.gmail.com>
 <CAFmTo48cLU9E05EHbSwfOfQdSGow4grFcgUUB9qwYA1DdvswbA@mail.gmail.com>
 <CACLE=7Ni=dQ70pBmwsEkMRmhRM51YeFAsxV0qqW9-Tre2oR-8w@mail.gmail.com>
 <CAFmTo496mtHxHr3W1-M9g5H44V4F8OKqmd+Vb2HPJ2gCcn=fCQ@mail.gmail.com>
 <CACLE=7PzaxQjayGOF0=VxMq2iX7+P4KwHEw4XJo8suGx9Bj=QQ@mail.gmail.com> <CAFmTo4-vUPVC9OecrZ5VMsVxL6gEq3cbjffqtDeNaE9VnFfHVw@mail.gmail.com>
From: Romain Manni-Bucau <rmannibucau@gmail.com>
Date: Sun, 18 Feb 2018 19:24:49 +0100
Message-ID: <CACLE=7OkQuDETHvzg+k0AEFsaA11u9htzbYbG8G5PxMoEMavrg@mail.gmail.com>
Subject: Re: @TearDown guarantees
To: dev@beam.apache.org
Content-Type: multipart/alternative; boundary="f403043842b491e1c8056580b276"

--f403043842b491e1c8056580b276
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

2018-02-18 19:19 GMT+01:00 Eugene Kirpichov <kirpichov@google.com>:

> FinishBundle has a stronger guarantee: if the pipeline succeeded, then it
> has been called for every succeeded bundle, and succeeded bundles togethe=
r
> cover the entire input PCollection. Of course, it may not have been calle=
d
> for failed bundles.
> To anticipate a possible objection "why not also keep retrying Teardown
> until it succeeds" - because if Teardown wasn't called on a DoFn instance=
,
> it's because the instance no longer exists and there's nothing to call it
> on.
>
> Please take a look at implementations of WriteFiles and BigQueryIO.read()
> and write() to see how cleanup of heavyweight resources (large number of
> temp files, temporary BigQuery datasets) can be achieved reliably to the
> extent possible.
>

Do you mean passing state accross the fn and having a fn responsible of the
cleanup? Kind of making the teardown a processelement? This is a nice
workaround but it is not always possible as mentionned. Ismael even has a
nice case where this just fails and teardown would work - was with AWS, not
a bigquery bug,  but same design.


>
> On Sun, Feb 18, 2018 at 9:56 AM Romain Manni-Bucau <rmannibucau@gmail.com=
>
> wrote:
>
>> 2018-02-18 18:36 GMT+01:00 Eugene Kirpichov <kirpichov@google.com>:
>>
>>> "Machine state" is overly low-level because many of the possible reason=
s
>>> can happen on a perfectly fine machine.
>>> If you'd like to rephrase it to "it will be called except in various
>>> situations where it's logically impossible or impractical to guarantee =
that
>>> it's called", that's fine. Or you can list some of the examples above.
>>>
>>
>> Sounds ok to me
>>
>>
>>>
>>> The main point for the user is, you *will* see non-preventable
>>> situations where it couldn't be called - it's not just intergalactic
>>> crashes - so if the logic is very important (e.g. cleaning up a large
>>> amount of temporary files, shutting down a large number of VMs you star=
ted
>>> etc), you have to express it using one of the other methods that have
>>> stricter guarantees (which obviously come at a cost, e.g. no
>>> pass-by-reference).
>>>
>>
>> FinishBundle has the exact same guarantee sadly so not which which other
>> method you speak about. Concretely if you make it really unreliable - th=
is
>> is what best effort sounds to me - then users can use it to clean anythi=
ng
>> but if you make it "can happen but it is unexpected and means something
>> happent" then it is fine to have a manual - or auto if fancy - recovery
>> procedure. This is where it makes all the difference and impacts the
>> developpers, ops (all users basically).
>>
>>
>>>
>>> On Sun, Feb 18, 2018 at 9:16 AM Romain Manni-Bucau <
>>> rmannibucau@gmail.com> wrote:
>>>
>>>> Agree Eugene except that "best effort" means that. It is also often
>>>> used to say "at will" and this is what triggered this thread.
>>>>
>>>> I'm fine using "except if the machine state prevents it" but "best
>>>> effort" is too open and can be very badly and wrongly perceived by use=
rs
>>>> (like I did).
>>>>
>>>>
>>>> Romain Manni-Bucau
>>>> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>>> <http://rmannibucau.wordpress.com> | Github
>>>> <https://github.com/rmannibucau> | LinkedIn
>>>> <https://www.linkedin.com/in/rmannibucau> | Book
>>>> <https://www.packtpub.com/application-development/java-ee-8-high-perfo=
rmance>
>>>>
>>>> 2018-02-18 18:13 GMT+01:00 Eugene Kirpichov <kirpichov@google.com>:
>>>>
>>>>> It will not be called if it's impossible to call it: in the example
>>>>> situation you have (intergalactic crash), and in a number of more com=
mon
>>>>> cases: eg in case the worker container has crashed (eg user code in a
>>>>> different thread called a C library over JNI and it segfaulted), JVM =
bug,
>>>>> crash due to user code OOM, in case the worker has lost network
>>>>> connectivity (then it may be called but it won't be able to do anythi=
ng
>>>>> useful), in case this is running on a preemptible VM and it was preem=
pted
>>>>> by the underlying cluster manager without notice or if the worker was=
 too
>>>>> busy with other stuff (eg calling other Teardown functions) until the
>>>>> preemption timeout elapsed, in case the underlying hardware simply fa=
iled
>>>>> (which happens quite often at scale), and in many other conditions.
>>>>>
>>>>> "Best effort" is the commonly used term to describe such behavior.
>>>>> Please feel free to file bugs for cases where you observed a runner n=
ot
>>>>> call Teardown in a situation where it was possible to call it but the
>>>>> runner made insufficient effort.
>>>>>
>>>>> On Sun, Feb 18, 2018, 9:02 AM Romain Manni-Bucau <
>>>>> rmannibucau@gmail.com> wrote:
>>>>>
>>>>>> 2018-02-18 18:00 GMT+01:00 Eugene Kirpichov <kirpichov@google.com>:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Feb 18, 2018, 2:06 AM Romain Manni-Bucau <
>>>>>>> rmannibucau@gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Le 18 f=C3=A9vr. 2018 00:23, "Kenneth Knowles" <klk@google.com> a
>>>>>>>> =C3=A9crit :
>>>>>>>>
>>>>>>>> On Sat, Feb 17, 2018 at 3:09 PM, Romain Manni-Bucau <
>>>>>>>> rmannibucau@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> If you give an example of a high-level need (e.g. "I'm trying to
>>>>>>>>> write an IO for system $x and it requires the following initializ=
ation and
>>>>>>>>> the following cleanup logic and the following processing in betwe=
en") I'll
>>>>>>>>> be better able to help you.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Take a simple example of a transform requiring a connection. Usin=
g
>>>>>>>>> bundles is a perf killer since size is not controlled. Using tear=
down
>>>>>>>>> doesnt allow you to release the connection since it is a best eff=
ort thing.
>>>>>>>>> Not releasing the connection makes you pay a lot - aws ;) - or pr=
events you
>>>>>>>>> to launch other processings - concurrent limit.
>>>>>>>>>
>>>>>>>>
>>>>>>>> For this example @Teardown is an exact fit. If things die so badly
>>>>>>>> that @Teardown is not called then nothing else can be called to cl=
ose the
>>>>>>>> connection either. What AWS service are you thinking of that stays=
 open for
>>>>>>>> a long time when everything at the other end has died?
>>>>>>>>
>>>>>>>>
>>>>>>>> You assume connections are kind of stateless but some (proprietary=
)
>>>>>>>> protocols requires some closing exchanges which are not only "im l=
eaving".
>>>>>>>>
>>>>>>>> For aws i was thinking about starting some services - machines - o=
n
>>>>>>>> the fly in a pipeline startup and closing them at the end. If tear=
down is
>>>>>>>> not called you leak machines and money. You can say it can be done=
 another
>>>>>>>> way...as the full pipeline ;).
>>>>>>>>
>>>>>>>> I dont want to be picky but if beam cant handle its components
>>>>>>>> lifecycle it can be used at scale for generic pipelines and if bou=
nd to
>>>>>>>> some particular IO.
>>>>>>>>
>>>>>>>> What does prevent to enforce teardown - ignoring the interstellar
>>>>>>>> crash case which cant be handled by any human system? Nothing tech=
nically.
>>>>>>>> Why do you push to not handle it? Is it due to some legacy code on=
 dataflow
>>>>>>>> or something else?
>>>>>>>>
>>>>>>> Teardown *is* already documented and implemented this way
>>>>>>> (best-effort). So I'm not sure what kind of change you're asking fo=
r.
>>>>>>>
>>>>>>
>>>>>> Remove "best effort" from the javadoc. If it is not call then it is =
a
>>>>>> bug and we are done :).
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Also what does it mean for the users? Direct runner does it so if =
a
>>>>>>>> user udes the RI in test, he will get a different behavior in prod=
? Also
>>>>>>>> dont forget the user doesnt know what the IOs he composes use so t=
his is so
>>>>>>>> impacting for the whole product than he must be handled IMHO.
>>>>>>>>
>>>>>>>> I understand the portability culture is new in big data world but
>>>>>>>> it is not a reason to ignore what people did for years and do it w=
rong
>>>>>>>> before doing right ;).
>>>>>>>>
>>>>>>>> My proposal is to list what can prevent to guarantee - in the
>>>>>>>> normal IT conditions - the execution of teardown. Then we see if w=
e can
>>>>>>>> handle it and only if there is a technical reason we cant we make =
it
>>>>>>>> experimental/unsupported in the api. I know spark and flink can, a=
ny
>>>>>>>> unknown blocker for other runners?
>>>>>>>>
>>>>>>>> Technical note: even a kill should go through java shutdown hooks
>>>>>>>> otherwise your environment (beam enclosing software) is fully unha=
ndled and
>>>>>>>> your overall system is uncontrolled. Only case where it is not tru=
e is when
>>>>>>>> the software is always owned by a vendor and never installed on cu=
stomer
>>>>>>>> environment. In this case it belongd to the vendor to handle beam =
API and
>>>>>>>> not to beam to adjust its API for a vendor - otherwise all unsuppo=
rted
>>>>>>>> features by one runner should be made optional right?
>>>>>>>>
>>>>>>>> All state is not about network, even in distributed systems so thi=
s
>>>>>>>> is key to have an explicit and defined lifecycle.
>>>>>>>>
>>>>>>>>
>>>>>>>> Kenn
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>

--f403043842b491e1c8056580b276
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><div><div class=3D"gmail_signat=
ure" data-smartmail=3D"gmail_signature"><div dir=3D"ltr"><div dir=3D"ltr"><=
div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=
=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><br></div></div></div></div></di=
v></div></div></div></div></div></div>
<br><div class=3D"gmail_quote">2018-02-18 19:19 GMT+01:00 Eugene Kirpichov =
<span dir=3D"ltr">&lt;<a href=3D"mailto:kirpichov@google.com" target=3D"_bl=
ank">kirpichov@google.com</a>&gt;</span>:<br><blockquote class=3D"gmail_quo=
te" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"=
><div dir=3D"ltr"><div>FinishBundle has a stronger guarantee: if the pipeli=
ne succeeded, then it has been called for every succeeded bundle, and succe=
eded bundles together cover the entire input PCollection. Of course, it may=
 not have been called for failed bundles.</div><div>To anticipate a possibl=
e objection &quot;why not also keep retrying Teardown until it succeeds&quo=
t; - because if Teardown wasn&#39;t called on a DoFn instance, it&#39;s bec=
ause the instance no longer exists and there&#39;s nothing to call it on.</=
div><div><br></div><div>Please take a look at implementations of WriteFiles=
 and BigQueryIO.read() and write() to see how cleanup of heavyweight resour=
ces (large number of temp files, temporary BigQuery datasets) can be achiev=
ed reliably to the extent possible.</div></div></blockquote><div><br></div>=
<div>Do you mean passing state accross the fn and having a fn responsible o=
f the cleanup? Kind of making the teardown a processelement? This is a nice=
 workaround but it is not always possible as mentionned. Ismael even has a =
nice case where this just fails and teardown would work - was with AWS, not=
 a bigquery bug,=C2=A0 but same design.</div><div>=C2=A0</div><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;=
padding-left:1ex"><div class=3D"HOEnZb"><div class=3D"h5"><br><div class=3D=
"gmail_quote"><div dir=3D"ltr">On Sun, Feb 18, 2018 at 9:56 AM Romain Manni=
-Bucau &lt;<a href=3D"mailto:rmannibucau@gmail.com" target=3D"_blank">rmann=
ibucau@gmail.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" =
style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><di=
v dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">2018-02=
-18 18:36 GMT+01:00 Eugene Kirpichov <span dir=3D"ltr">&lt;<a href=3D"mailt=
o:kirpichov@google.com" target=3D"_blank">kirpichov@google.com</a>&gt;</spa=
n>:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-=
left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">&quot;Machine state&=
quot; is overly low-level because many of the possible reasons can happen o=
n a perfectly fine machine.<div>If you&#39;d like to rephrase it to &quot;i=
t will be called except in various situations where it&#39;s logically impo=
ssible or impractical to guarantee that it&#39;s called&quot;, that&#39;s f=
ine. Or you can list some of the examples above.</div></div></blockquote><d=
iv><br></div></div></div></div><div dir=3D"ltr"><div class=3D"gmail_extra">=
<div class=3D"gmail_quote"><div>Sounds ok to me</div></div></div></div><div=
 dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote"><div>=C2=
=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;borde=
r-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><br></div><di=
v>The main point for the user is, you *will* see non-preventable situations=
 where it couldn&#39;t be called - it&#39;s not just intergalactic crashes =
- so if the logic is very important (e.g. cleaning up a large amount of tem=
porary files, shutting down a large number of VMs you started etc), you hav=
e to express it using one of the other methods that have stricter guarantee=
s (which obviously come at a cost, e.g. no pass-by-reference).</div></div><=
/blockquote><div><br></div></div></div></div><div dir=3D"ltr"><div class=3D=
"gmail_extra"><div class=3D"gmail_quote"><div>FinishBundle has the exact sa=
me guarantee sadly so not which which other method you speak about. Concret=
ely if you make it really unreliable - this is what best effort sounds to m=
e - then users can use it to clean anything but if you make it &quot;can ha=
ppen but it is unexpected and means something happent&quot; then it is fine=
 to have a manual - or auto if fancy - recovery procedure. This is where it=
 makes all the difference and impacts the developpers, ops (all users basic=
ally).</div></div></div></div><div dir=3D"ltr"><div class=3D"gmail_extra"><=
div class=3D"gmail_quote"><div>=C2=A0</div><blockquote class=3D"gmail_quote=
" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><=
div class=3D"m_-6760369799346795844m_-5435767949025586018HOEnZb"><div class=
=3D"m_-6760369799346795844m_-5435767949025586018h5"><br><div class=3D"gmail=
_quote"><div dir=3D"ltr">On Sun, Feb 18, 2018 at 9:16 AM Romain Manni-Bucau=
 &lt;<a href=3D"mailto:rmannibucau@gmail.com" target=3D"_blank">rmannibucau=
@gmail.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=
=3D"ltr">Agree Eugene except that &quot;best effort&quot; means that. It is=
 also often used to say &quot;at will&quot; and this is what triggered this=
 thread.<div><br></div><div>I&#39;m fine using &quot;except if the machine =
state prevents it&quot; but &quot;best effort&quot; is too open and can be =
very badly and wrongly perceived by users (like I did).</div></div><div cla=
ss=3D"gmail_extra"></div><div class=3D"gmail_extra"><br clear=3D"all"><div>=
<div class=3D"m_-6760369799346795844m_-5435767949025586018m_-17416181002328=
12998m_4140883495168710831gmail_signature" data-smartmail=3D"gmail_signatur=
e"><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div d=
ir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr=
"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><br><span style=3D"font-=
size:small">Romain Manni-Bucau</span><br><a href=3D"https://twitter.com/rma=
nnibucau" target=3D"_blank">@rmannibucau</a> | =C2=A0<a href=3D"https://rma=
nnibucau.metawerx.net/" target=3D"_blank">Blog</a>=C2=A0| <a href=3D"http:/=
/rmannibucau.wordpress.com" target=3D"_blank">Old Blog</a> |=C2=A0<a href=
=3D"https://github.com/rmannibucau" target=3D"_blank">Github</a>=C2=A0| <a =
href=3D"https://www.linkedin.com/in/rmannibucau" target=3D"_blank">LinkedIn=
</a>=C2=A0| <a href=3D"https://www.packtpub.com/application-development/jav=
a-ee-8-high-performance" target=3D"_blank">Book</a></div></div></div></div>=
</div></div></div></div></div></div></div></div></div></div></div></div></d=
iv></div></div></div>
<br></div><div class=3D"gmail_extra"><div class=3D"gmail_quote">2018-02-18 =
18:13 GMT+01:00 Eugene Kirpichov <span dir=3D"ltr">&lt;<a href=3D"mailto:ki=
rpichov@google.com" target=3D"_blank">kirpichov@google.com</a>&gt;</span>:<=
br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left=
:1px #ccc solid;padding-left:1ex"><p dir=3D"ltr">It will not be called if i=
t&#39;s impossible to call it: in the example situation you have (intergala=
ctic crash), and in a number of more common cases: eg in case the worker co=
ntainer has crashed (eg user code in a different thread called a C library =
over JNI and it segfaulted), JVM bug, crash due to user code OOM, in case t=
he worker has lost network connectivity (then it may be called but it won&#=
39;t be able to do anything useful), in case this is running on a preemptib=
le VM and it was preempted by the underlying cluster manager without notice=
 or if the worker was too busy with other stuff (eg calling other Teardown =
functions) until the preemption timeout elapsed, in case the underlying har=
dware simply failed (which happens quite often at scale), and in many other=
 conditions.</p>
<p dir=3D"ltr">&quot;Best effort&quot; is the commonly used term to describ=
e such behavior. Please feel free to file bugs for cases where you observed=
 a runner not call Teardown in a situation where it was possible to call it=
 but the runner made insufficient effort.</p><div class=3D"m_-6760369799346=
795844m_-5435767949025586018m_-1741618100232812998m_4140883495168710831HOEn=
Zb"><div class=3D"m_-6760369799346795844m_-5435767949025586018m_-1741618100=
232812998m_4140883495168710831h5">
<br><div class=3D"gmail_quote"><div dir=3D"ltr">On Sun, Feb 18, 2018, 9:02 =
AM Romain Manni-Bucau &lt;<a href=3D"mailto:rmannibucau@gmail.com" target=
=3D"_blank">rmannibucau@gmail.com</a>&gt; wrote:<br></div><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gma=
il_quote">2018-02-18 18:00 GMT+01:00 Eugene Kirpichov <span dir=3D"ltr">&lt=
;<a href=3D"mailto:kirpichov@google.com" target=3D"_blank">kirpichov@google=
.com</a>&gt;</span>:<br><blockquote class=3D"gmail_quote" style=3D"margin:0=
 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span><br><br><div c=
lass=3D"gmail_quote"><div dir=3D"ltr">On Sun, Feb 18, 2018, 2:06 AM Romain =
Manni-Bucau &lt;<a href=3D"mailto:rmannibucau@gmail.com" target=3D"_blank">=
rmannibucau@gmail.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_qu=
ote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex=
"><div dir=3D"auto"><div><br><div class=3D"gmail_extra"><br><div class=3D"g=
mail_quote">Le=C2=A018 f=C3=A9vr. 2018 00:23, &quot;Kenneth Knowles&quot; &=
lt;<a href=3D"mailto:klk@google.com" target=3D"_blank">klk@google.com</a>&g=
t; a =C3=A9crit=C2=A0:<br type=3D"attribution"><blockquote class=3D"m_-6760=
369799346795844m_-5435767949025586018m_-1741618100232812998m_41408834951687=
10831m_-6617380187045698964m_2986722643604495480m_6232234257896630365m_8024=
34283748982529quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;=
padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D=
"gmail_quote"><div class=3D"m_-6760369799346795844m_-5435767949025586018m_-=
1741618100232812998m_4140883495168710831m_-6617380187045698964m_29867226436=
04495480m_6232234257896630365m_802434283748982529quoted-text">On Sat, Feb 1=
7, 2018 at 3:09 PM, Romain Manni-Bucau <span dir=3D"ltr">&lt;<a href=3D"mai=
lto:rmannibucau@gmail.com" target=3D"_blank">rmannibucau@gmail.com</a>&gt;<=
/span> wrote:<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;b=
order-left:1px #ccc solid;padding-left:1ex"><div dir=3D"auto"><span><div><d=
iv class=3D"gmail_extra"><div class=3D"gmail_quote"><blockquote class=3D"m_=
-6760369799346795844m_-5435767949025586018m_-1741618100232812998m_414088349=
5168710831m_-6617380187045698964m_2986722643604495480m_6232234257896630365m=
_802434283748982529m_-608950452209379860m_3729799869006756057quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=
=3D"ltr"><div dir=3D"ltr"><div class=3D"gmail_quote"><div>If you give an ex=
ample of a high-level need (e.g. &quot;I&#39;m trying to write an IO for sy=
stem $x and it requires the following initialization and the following clea=
nup logic and the following processing in between&quot;) I&#39;ll be better=
 able to help you.</div></div></div></div></blockquote></div></div></div><d=
iv dir=3D"auto"><br></div></span><div dir=3D"auto">Take a simple example of=
 a transform requiring a connection. Using bundles is a perf killer since s=
ize is not controlled. Using teardown doesnt allow you to release the conne=
ction since it is a best effort thing. Not releasing the connection makes y=
ou pay a lot - aws ;) - or prevents you to launch other processings - concu=
rrent limit.</div></div></blockquote><div><br></div></div><div>For this exa=
mple @Teardown is an exact fit. If things die so badly that @Teardown is no=
t called then nothing else can be called to close the connection either. Wh=
at AWS service are you thinking of that stays open for a long time when eve=
rything at the other end has died?</div></div></div></div></blockquote></di=
v></div></div><div dir=3D"auto"><br></div></div><div dir=3D"auto"><div dir=
=3D"auto">You assume connections are kind of stateless but some (proprietar=
y) protocols requires some closing exchanges which are not only &quot;im le=
aving&quot;.</div><div dir=3D"auto"><br></div><div dir=3D"auto">For aws i w=
as thinking about starting some services - machines - on the fly in a pipel=
ine startup and closing them at the end. If teardown is not called you leak=
 machines and money. You can say it can be done another way...as the full p=
ipeline ;).</div><div dir=3D"auto"><br></div><div dir=3D"auto">I dont want =
to be picky but if beam cant handle its components lifecycle it can be used=
 at scale for generic pipelines and if bound to some particular IO.</div><d=
iv dir=3D"auto"><br></div><div dir=3D"auto">What does prevent to enforce te=
ardown - ignoring the interstellar crash case which cant be handled by any =
human system? Nothing technically. Why do you push to not handle it? Is it =
due to some legacy code on dataflow or something else?</div></div></blockqu=
ote></div></span><div>Teardown *is* already documented and implemented this=
 way (best-effort). So I&#39;m not sure what kind of change you&#39;re aski=
ng for.</div></blockquote><div><br></div></div></div></div><div dir=3D"ltr"=
><div class=3D"gmail_extra"><div class=3D"gmail_quote"><div>Remove &quot;be=
st effort&quot; from the javadoc. If it is not call then it is a bug and we=
 are done :).</div></div></div></div><div dir=3D"ltr"><div class=3D"gmail_e=
xtra"><div class=3D"gmail_quote"><div>=C2=A0</div><blockquote class=3D"gmai=
l_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left=
:1ex"><div class=3D"m_-6760369799346795844m_-5435767949025586018m_-17416181=
00232812998m_4140883495168710831m_-6617380187045698964m_2986722643604495480=
HOEnZb"><div class=3D"m_-6760369799346795844m_-5435767949025586018m_-174161=
8100232812998m_4140883495168710831m_-6617380187045698964m_29867226436044954=
80h5"><div><br></div><div class=3D"gmail_quote"><blockquote class=3D"gmail_=
quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1=
ex"><div dir=3D"auto"><div dir=3D"auto"><br></div><div dir=3D"auto">Also wh=
at does it mean for the users? Direct runner does it so if a user udes the =
RI in test, he will get a different behavior in prod? Also dont forget the =
user doesnt know what the IOs he composes use so this is so impacting for t=
he whole product than he must be handled IMHO.</div><div dir=3D"auto"><br><=
/div><div dir=3D"auto">I understand the portability culture is new in big d=
ata world but it is not a reason to ignore what people did for years and do=
 it wrong before doing right ;).</div><div dir=3D"auto"><br></div><div dir=
=3D"auto">My proposal is to list what can prevent to guarantee - in the nor=
mal IT conditions - the execution of teardown. Then we see if we can handle=
 it and only if there is a technical reason we cant we make it experimental=
/unsupported in the api. I know spark and flink can, any unknown blocker fo=
r other runners?</div><div dir=3D"auto"><br></div><div dir=3D"auto">Technic=
al note: even a kill should go through java shutdown hooks otherwise your e=
nvironment (beam enclosing software) is fully unhandled and your overall sy=
stem is uncontrolled. Only case where it is not true is when the software i=
s always owned by a vendor and never installed on customer environment. In =
this case it belongd to the vendor to handle beam API and not to beam to ad=
just its API for a vendor - otherwise all unsupported features by one runne=
r should be made optional right?</div><div dir=3D"auto"><br></div><div dir=
=3D"auto">All state is not about network, even in distributed systems so th=
is is key to have an explicit and defined lifecycle.</div><div dir=3D"auto"=
><br></div><div dir=3D"auto"><div class=3D"gmail_extra"><div class=3D"gmail=
_quote"><blockquote class=3D"m_-6760369799346795844m_-5435767949025586018m_=
-1741618100232812998m_4140883495168710831m_-6617380187045698964m_2986722643=
604495480m_6232234257896630365m_802434283748982529quote" style=3D"margin:0 =
0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div=
 class=3D"gmail_extra"><div class=3D"gmail_quote"><div><br></div><div>Kenn<=
/div><div><br></div></div></div></div>
</blockquote></div><br></div></div></div>
</blockquote></div>
</div></div></blockquote></div></div></div></blockquote></div>
</div></div></blockquote></div><br></div></blockquote></div>
</div></div></blockquote></div></div></div></blockquote></div>
</div></div></blockquote></div><br></div></div>

--f403043842b491e1c8056580b276--