Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 67F16200CB5 for ; Wed, 12 Jul 2017 23:39:03 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 665EE16AA9B; Wed, 12 Jul 2017 21:39:03 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 87CFD16AA8F for ; Wed, 12 Jul 2017 23:39:02 +0200 (CEST) Received: (qmail 75394 invoked by uid 500); 12 Jul 2017 21:39:01 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 75384 invoked by uid 99); 12 Jul 2017 21:39:01 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Jul 2017 21:39:01 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 22CAF1960F2 for ; Wed, 12 Jul 2017 21:39:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.192 X-Spam-Level: *** X-Spam-Status: No, score=3.192 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URI_HEX=1.313] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id LbdrwDzlA8EQ for ; Wed, 12 Jul 2017 21:38:58 +0000 (UTC) Received: from mail-vk0-f50.google.com (mail-vk0-f50.google.com [209.85.213.50]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 411876270E for ; Wed, 12 Jul 2017 21:22:05 +0000 (UTC) Received: by mail-vk0-f50.google.com with SMTP id r125so20195774vkf.1 for ; Wed, 12 Jul 2017 14:22:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=bVf+ojBxaviV1iJXtSLX4JUTwvTsPl9Z+MvOKVhY9J8=; b=YdjVFN5jfwMtUXPbdHObJU+doaJK8ye/rHY0vCCgosUIiwCe8vzkXGkBX2e67vUjpi Og+HcbakYMPxeIlUvC0+0X3CA+722HbzKSoQRByAyyFmlpkQkgJ0NNrSeCn1mR45gb7E 5qmQkbHv1dddKKvwxny0J5MCjtBYDDJwNECqZivGd1YW0285uvSxsqyGz4U7SNDIQ4xy 0ThDJr0hAQD72EnJUMlJGcovw4MFknEEHov4jj9aw1dV7BG+xPsRv78udVKPZ5inZjcm wKtgZWZ0e8unPPK5Ju2C85dIJZ3X7KbOCU3s0xqovBbliyNZ0Y939PLg6j++ZDFrI2gV lDSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=bVf+ojBxaviV1iJXtSLX4JUTwvTsPl9Z+MvOKVhY9J8=; b=i+agQHHbCjxauC9YjWQqk7JXSG6UJ/m6GD3pujVb2PZCSO5F5b7FwOFwm0IIk6m7VC 6Clh4/xVwtZWcAdEK2ORpq5javBz/J4lIjyWu2jFGQVaxxoZ66+JiYltixQOGgX17Ra7 A09UIn5B2w1TisUZWtTs8+3CVz+mX03Vjynwm4Ysodxs8DA8SjCTH5hjgUTmBMXztedB qleVeCWVVTgIzB+nFa6523ZjCN7lZMctZYBGpuvXiyNLCaLy2Q8FQHYz9o0K5DOp+YUi Sn5USCJulnkGbTzidlUdIkztp+zfNr/0nWmWHcmK5UOs7/vC8HUD2Yi237b5R0v75GAg q2uw== X-Gm-Message-State: AIVw110F1jW57eWvv+loBP8dZ4SYm4+PwmXoPKh6ZeeYkYffmgFImLy0 D768snGfirtcsGDYmrqG1Pb+FsINDQ== X-Received: by 10.31.63.14 with SMTP id m14mr290594vka.101.1499894524834; Wed, 12 Jul 2017 14:22:04 -0700 (PDT) MIME-Version: 1.0 References: <1499883755759-14229.post@n4.nabble.com> In-Reply-To: <1499883755759-14229.post@n4.nabble.com> From: =?UTF-8?Q?Gyula_F=C3=B3ra?= Date: Wed, 12 Jul 2017 21:21:54 +0000 Message-ID: Subject: Re: sanity check in production To: burgesschen , user@flink.apache.org Content-Type: multipart/alternative; boundary="001a114dd3405a77390554256874" archived-at: Wed, 12 Jul 2017 21:39:03 -0000 --001a114dd3405a77390554256874 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi! Assuming you have some spare compute resources on your cluster (which you should have in a production setting to be safe). I think 2) would be the best option, ideally started from a savepoint of the production job to verify your state logic as well. You could also run the test job on a smaller parallelism setting, and verify that it actually works, then maybe run some live data through it as well before killing of the test job and updating the prod job. Even though this might have a fairly high temporary cost I think it is ultimately worth it to test on live data before upgrading the production job. Cheers, Gyula burgesschen ezt =C3=ADrta (id=C5=91pont: 2017. j= =C3=BAl. 12., Sze, 20:49): > Hello everyone, > > Our team ran into an issue that testing new deployment of flink job is > difficult as explained below > > > Goal: > When we are deploying new version of a flink job in production. we want t= o > be able to have the job process some test messages and verify the output = to > make sure that the job is running correctly. (sanity check) > > Problem: > The tests messages interfere with the watermark of the flink job, > potentially causing it dropping real messages. > > Possible solutions: > 1. have a separate watermark for the test messages > (looks not supported by the current framework) > > 2. run a separate Flink job (same code) in production for sanity check > before actual deployment > (high operational costs) > > 3. cancel the running production job with a save point, run a new job wit= h > the save point, do sanity check and mess up the watermark of the new job, > kill the new job, do actual deployment with the same save point. > (high operational costs) > > Any idea is appreciated, thanks! > > > > -- > View this message in context: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/sanit= y-check-in-production-tp14229.html > Sent from the Apache Flink User Mailing List archive. mailing list archiv= e > at Nabble.com. > --001a114dd3405a77390554256874 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi!

Assuming you have some spare comput= e resources on your cluster (which you should have in a production setting = to be safe). I think 2) would be the best option, ideally started from a sa= vepoint of the production job to verify your state logic as well.

You could also run the test job on a smaller parallelism se= tting, and verify that it actually works, then maybe run some live data thr= ough it as well before killing of the test job and updating the prod job.

Even though this might have a fairly high temporary= cost I think it is ultimately worth it to test on live data before upgradi= ng the production job.=C2=A0

Cheers,
Gyu= la

burgesschen &= lt;tchen325@bloomberg.net>= ezt =C3=ADrta (id=C5=91pont: 2017. j=C3=BAl. 12., Sze, 20:49):
Hello everyone,

Our team ran into an issue that testing new deployment of flink job is
difficult as explained below


Goal:
When we are deploying new version of a flink job in production. we want to<= br> be able to have the job process some test messages and verify the output to=
make sure that the job is running correctly. (sanity check)

Problem:
The tests messages interfere with the watermark of the flink job,
potentially causing it dropping real messages.

Possible solutions:
1. have a separate watermark for the test messages
=C2=A0 (looks not supported by the current framework)

2. run a separate Flink job (same code) in production for sanity check
before actual deployment
=C2=A0 (high operational costs)

3. cancel the running production job with a save point, run a new job with<= br> the save point, do sanity check and mess up the watermark of the new job, kill the new job, do actual deployment with the same save point.
=C2=A0 (high operational costs)

Any idea is appreciated, thanks!



--
View this message in context: http://apache-flink-user-mailing-list-= archive.2336050.n4.nabble.com/sanity-check-in-production-tp14229.html Sent from the Apache Flink User Mailing List archive. mailing list archive = at Nabble.com.
--001a114dd3405a77390554256874--