Return-Path: X-Original-To: apmail-apex-dev-archive@minotaur.apache.org Delivered-To: apmail-apex-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A9C05181D6 for ; Thu, 1 Oct 2015 18:13:59 +0000 (UTC) Received: (qmail 28238 invoked by uid 500); 1 Oct 2015 18:13:54 -0000 Delivered-To: apmail-apex-dev-archive@apex.apache.org Received: (qmail 28179 invoked by uid 500); 1 Oct 2015 18:13:54 -0000 Mailing-List: contact dev-help@apex.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@apex.incubator.apache.org Delivered-To: mailing list dev@apex.incubator.apache.org Received: (qmail 28166 invoked by uid 99); 1 Oct 2015 18:13:54 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Oct 2015 18:13:54 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id D1C9F1A36F5 for ; Thu, 1 Oct 2015 18:13:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3 X-Spam-Level: *** X-Spam-Status: No, score=3 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id fu0ydpgp4vbu for ; Thu, 1 Oct 2015 18:13:46 +0000 (UTC) Received: from mail-pa0-f48.google.com (mail-pa0-f48.google.com [209.85.220.48]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 1B3B720592 for ; Thu, 1 Oct 2015 18:13:45 +0000 (UTC) Received: by pablk4 with SMTP id lk4so80785716pab.3 for ; Thu, 01 Oct 2015 11:13:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:content-type:message-id:mime-version :subject:date:references:to:in-reply-to; bh=aHX+5yWAnyjFOARNQzalOZtMYtNsGneLJbeOEL2iIi8=; b=AGFAgm+6LpIzoM4q7py97EY423KUE236tFgflrqwGYePCGNt/7rhd3p4AKv5Kb2Piv 1t98HwkNm+wSsa0WcV3aJepdFfonVv3P0F4yuUBGX+UvZhFJcQZfh1ZeqnaYxl3dfCbv YqLj5xnbq91zd6uwalvMYaz6axu5qj8QVNMMDgSd+yAuwbIKAuAunOEAdMg4iS80WkTM QTSEyjo4UfEYdccY2Gsfzo0BJGe316ViQ7khzofqJm8dCBrmv73lDMEBAIn8ZLloimo4 c071YXWHTliS7cJOGg1t82N3BOGs+xv/DZpEHyIN9HIkNayUJKbNQEN6w9E9sw/RVxkW i/eQ== X-Gm-Message-State: ALoCoQkgUD/8VXBvIWUnMMX3suoEVra5Rv6Oz8wgaHEyIo0PuMd2A4Mn5RPByqZ9zYbQNGRNzW0R X-Received: by 10.68.94.165 with SMTP id dd5mr5109711pbb.59.1443723223569; Thu, 01 Oct 2015 11:13:43 -0700 (PDT) Received: from [192.168.128.9] ([173.245.93.28]) by smtp.gmail.com with ESMTPSA id u3sm7971682pbs.33.2015.10.01.11.13.42 for (version=TLSv1/SSLv3 cipher=OTHER); Thu, 01 Oct 2015 11:13:42 -0700 (PDT) From: Gaurav Gupta Content-Type: multipart/alternative; boundary="Apple-Mail=_E7F7E908-4F37-42E6-AB75-E22E27D5055A" Message-Id: <0776560B-0B86-44A5-A8AA-8BB154095AE0@datatorrent.com> Mime-Version: 1.0 (Mac OS X Mail 9.0 \(3094\)) Subject: Re: dynamic application properties proposal Date: Thu, 1 Oct 2015 11:13:43 -0700 References: To: dev@apex.incubator.apache.org In-Reply-To: X-Mailer: Apple Mail (2.3094) --Apple-Mail=_E7F7E908-4F37-42E6-AB75-E22E27D5055A Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Pramod, The new special property change tuple will be send to all the Operators = and all the operators will have to check if the property change is = applicable for it. Although such requests may be very few, but is there = a way to optimize it? Thanks - Gaurav > On Sep 28, 2015, at 3:44 PM, Pramod Immaneni = wrote: >=20 > At the platform level that cannot be guaranteed as your operator = controls > and manages reading of the data. However it is not difficult to = envision > writing an operator that would pick up a new dataset when property is > changed. >=20 > On Mon, Sep 28, 2015 at 3:33 PM, Ashwin Chandra Putta < > ashwinchandrap@gmail.com> wrote: >=20 >> Great, looking forward to these changes. Does it also provide a = guarantee >> on which properties are used for which input data sets? >>=20 >> Few use case examples: >> - set property between reads of different batches of files. Say, = applying >> batch name property before processing the next batch of files. >> - load new configuration file for csv parser before processing next = set of >> data. >> - apply new regex before parsing next stream of tuples. >> etc. >>=20 >> One approach to allow this is to emit subsequent tuples only starting = next >> window after the window in which property change is made. That way, = the >> boundaries between data sets is fixed and property change is done in >> between. The user will now have a guarantee on which property value = is used >> on any given tuple. >>=20 >> Thoughts? >>=20 >> Regards, >> Ashwin. >>=20 >> On Mon, Sep 28, 2015 at 10:17 AM, Pramod Immaneni = >> wrote: >>=20 >>> Apex support modification of operator properties at runtime but the >> current >>> implemenations has the following shortcomings. >>>=20 >>> 1. Property is not set across all partitions on the same window as >>> individual partitions can be on different windows when property = change is >>> initiated from client resulting in inconsistency of data for those >> windows. >>> I am being generous using the word inconsistent. >>> 2. Sometimes properties need to be set on more than one logical = operators >>> at the same time to achieve the change the user is seeking. Today = they >> will >>> be two separate changes happening on two different windows again >> resulting >>> in inconsistent data for some windows. These would need to happen as = a >>> single transaction. >>> 3. If there is an operator failure before a committed checkpoint = after an >>> operator property is dynamically changed the operator will restart = with >> the >>> old property and the change will not be re-applied. >>>=20 >>> Tim and myself did some brainstorming and we have a proposal to = overcome >>> these shortcomings. The main problem in all the above cases is that = the >>> property changes are happening out-of-band of data flow and hence >>> independent of windowing. The proposal is to bring the property = change >>> request into the in-band dataflow so that they are handled = consistently >>> with windowing and handled distributively. >>>=20 >>> The idea is to inject a special property change tuple containing the >>> property changes and the identification information of the = operator's >> they >>> affect into the dataflow at the input operator. The tuple will be >> injected >>> at window boundary after end window and before begin window and as = this >>> tuple flows through the DAG the intended operators properties will = be >>> modifed. They will all be modified consistently at the same window. = The >>> tuple can contain more than one property changes for more than one >> logical >>> operators and the change will be applied consistently to the = different >>> logical operators at the same window. In case of failure the replay = of >>> tuples will ensure that the property change gets reapplied at the = correct >>> window. >>>=20 >>> Please give your feedback and input on what you think about this >> proposal. >>>=20 >>> Thanks >>>=20 >>=20 >>=20 >>=20 >> -- >>=20 >> Regards, >> Ashwin. >>=20 --Apple-Mail=_E7F7E908-4F37-42E6-AB75-E22E27D5055A--