Return-Path: X-Original-To: apmail-apex-dev-archive@minotaur.apache.org Delivered-To: apmail-apex-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D410819C5A for ; Tue, 5 Apr 2016 05:47:38 +0000 (UTC) Received: (qmail 68960 invoked by uid 500); 5 Apr 2016 05:47:38 -0000 Delivered-To: apmail-apex-dev-archive@apex.apache.org Received: (qmail 68891 invoked by uid 500); 5 Apr 2016 05:47:38 -0000 Mailing-List: contact dev-help@apex.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@apex.incubator.apache.org Delivered-To: mailing list dev@apex.incubator.apache.org Received: (qmail 68879 invoked by uid 99); 5 Apr 2016 05:47:38 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Apr 2016 05:47:38 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id E7094C05EF for ; Tue, 5 Apr 2016 05:47:37 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.298 X-Spam-Level: * X-Spam-Status: No, score=1.298 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=datatorrent-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id OWboM0EjYshS for ; Tue, 5 Apr 2016 05:47:33 +0000 (UTC) Received: from mail-pf0-f173.google.com (mail-pf0-f173.google.com [209.85.192.173]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 6D7085FB06 for ; Tue, 5 Apr 2016 05:47:33 +0000 (UTC) Received: by mail-pf0-f173.google.com with SMTP id n1so3611622pfn.2 for ; Mon, 04 Apr 2016 22:47:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=datatorrent-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=64WI3HaFCdKNYPW3WOM0OfX+oxae+zfktV6s8+m78fs=; b=qc19/+jwxd3WAgAWy8I+x3MwKQ+3HiU5Qt++SqTbfuCjwt0w7oBMwy9nM+Lhcn5YBJ CAl1S3besivzUTsP3S1Y8t3wkl4t75B5qq7q+WHhCMehCR6u5ftHwTg+HI1+AO+cdUmf l9FRPSgQ/VMet3N187zBV13Worb9toRoKxA3FlYO+x6Fx/r3MBIuZp1W8jEtg1rAN4wl FS9KH/0HVpWO/MzvKnmXTMbr4TThgNRcJuHFt+SOMasBWjz2YsxKGU+0rWU5ErCEsdUL Y1SLcSMHyEkz4rlejhlaSQahVZ0Xe5JQewwxi3zAydYy90ZElQjizsHo2dUTHhtOYWB0 0d0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=64WI3HaFCdKNYPW3WOM0OfX+oxae+zfktV6s8+m78fs=; b=Dg3sTm5u+HN7jgeWNnS8/6H/GJz+faiXIz98EyUrmO/XxuTWlg0jqoDQKCvPAUAUZh oxf7nXbcqfZBzTbTQ/4uFSFCO3gUvHRiSKcKyyV1Hrb/rCJmhjJlT4Zm485o/cZ0jys6 uNBUY7VzDP8btSayIIZK7ik7c8CtXGK3qs0Vk1rLeAPgatxYAxQbRP8pSNWjT+5bk5Gy lwzUEB4XRl8hr3/QZhliHIQXkR9A0381/njYbgvEORQvcQyzMfU6q6fS7C/ip53aMNK7 ucvnBj8Hwk2uEDXa69Lyy+mPq9yrvZVucbv8mIIqyhhFOLEbK7uBeLDrvNvZ0iHh+wxe 6u6g== X-Gm-Message-State: AD7BkJJ11+Hge1XMQ9hV54V84fdVF9H9SIaDe74CxI4LnhlzxQXSmU9SkRpfC52mIm/1/LC/ld6izUrLuJ8Wwmc7 MIME-Version: 1.0 X-Received: by 10.98.93.12 with SMTP id r12mr26926266pfb.64.1459835252332; Mon, 04 Apr 2016 22:47:32 -0700 (PDT) Received: by 10.66.65.77 with HTTP; Mon, 4 Apr 2016 22:47:32 -0700 (PDT) In-Reply-To: References: Date: Tue, 5 Apr 2016 11:17:32 +0530 Message-ID: Subject: Re: Bandwidth control for Input operators in Apex From: Priyanka Gugale To: dev@apex.incubator.apache.org Content-Type: multipart/alternative; boundary=001a113edceaa57503052fb661da --001a113edceaa57503052fb661da Content-Type: text/plain; charset=UTF-8 Okay so I will open the pull request soon. -Priyanka On Wed, Mar 23, 2016 at 1:35 PM, Yogi Devendra wrote: > This looks OK. Let us build it incrementally. > > ~ Yogi > > On 23 March 2016 at 13:24, Sandeep Deshmukh > wrote: > > > I would suggest that we go ahead with design as suggested by Priyanka > where > > we have bandwidth setup for each operator separately. We can later extend > > this for bandwidth to be shared with different input operators or for the > > DAG as a whole. > > > > Regards, > > Sandeep > > > > On Wed, Mar 23, 2016 at 11:51 AM, Priyanka Gugale < > > priyanka@datatorrent.com> > > wrote: > > > > > Right now it's not for output operator, but one can very well use > > bandwidth > > > manager to keep track of bandwidth usage and limit your output speed. > The > > > bigger challenge there would be, you won't be able to process window > data > > > sent by upstream operator in same window. For that you need to do more > > than > > > just bandwidth control. > > > So I would say, bandwidth control feature can be used as it is for > output > > > operator as well, only we need to do more than just bandwidth > limitation > > > for output operators. > > > > > > -Priyanka > > > > > > On Wed, Mar 23, 2016 at 11:47 AM, Priyanka Gugale < > > > priyanka@datatorrent.com> > > > wrote: > > > > > > > That's a good question Chaitanya, Right now the bandwidth control is > at > > > > Input Operator level and not application level. So if you have two > > input > > > > operator you need to set bandwidth on both separately by this design. > > > > May be it would be good to have bandwidth control at Application > level > > > > than operator level. Let me think if I can modify the design to do > > that. > > > If > > > > you have any ideas for same, please share them. > > > > > > > > -Priyanka > > > > > > > > On Wed, Mar 23, 2016 at 11:47 AM, Yogi Devendra < > > > > devendra.vyavahare@gmail.com> wrote: > > > > > > > >> Priyanka, > > > >> > > > >> From the design description it is not clear how it will be used to > > > control > > > >> output bandwidth (point #2,3,4 mentioned by Sandeep) > > > >> > > > >> ~ Yogi > > > >> > > > >> On 23 March 2016 at 11:39, Chaitanya Chebolu < > > chaitanya@datatorrent.com > > > > > > > >> wrote: > > > >> > > > >> > This is very useful feature. > > > >> > I would like to know, how you are distributing the bandwidth for > the > > > >> below > > > >> > situation: > > > >> > - Two input operators say i1 and i2 are deployed on same node and > > both > > > >> the > > > >> > operators have bandwidthManager as plugin. > > > >> > > > > >> > On Fri, Mar 18, 2016 at 5:43 PM, Priyanka Gugale < > > > >> priyanka@datatorrent.com > > > >> > > > > > >> > wrote: > > > >> > > > > >> > > Hi, > > > >> > > > > > >> > > Thanks for inputs Sandeep, would take care of those points. > > > >> > > > > > >> > > Here is high level design we are considering, We would have > > > following > > > >> > > components: > > > >> > > *1.* *BandwidthManager* > > > >> > > This keeps track of current bandwidth usage of system and takes > > > >> decision > > > >> > if > > > >> > > requested data bandwidth can be used right away or not. To do > this > > > it > > > >> > > used Leaky > > > >> > > bucket algorithm > > where > > > >> it > > > >> > > emits data as long as it has not overused bandwidth (i.e. > > bandwidth > > > >> > > consumption is >=0) and then wait to accumulate bandwidth for a > > > while > > > >> > (till > > > >> > > bandwidth goes from -ve value to +ve). > > > >> > > > > > >> > > *2. BandwidthLimitingInputOperator* > > > >> > > Any Input operator which want to implement bandwidth restriction > > > >> should > > > >> > > implement BandwidthLimitingInputOperator. The operator have > > abstract > > > >> > method > > > >> > > to initialize instance of BandwidthManager and a method to emit > > > tuple > > > >> > with > > > >> > > bandwidth restriction to emit tuples as per available bandwidth. > > > >> > > > > > >> > > *3. BandwidthPartitioner* > > > >> > > Bandwidth partitioner is introduced for static partitioning. If > > > static > > > >> > > partitioning is used by default StatelessPartitioner class is > > > >> > initialized. > > > >> > > With bandwidth restriction we want to equally divide bandwidth > > > amongst > > > >> > > available partitions. BandwidthPartitioner should take care of > it. > > > It > > > >> > > extends StatelessPartitioner, it just sets right bandwidth on > all > > > >> > > partitions after StatelessPartitioner creates/deletes > partitiolns. > > > In > > > >> > case > > > >> > > of dynamic partitioning the operator implementing > > definePartitions, > > > >> > should > > > >> > > take care of bandwidth distribution. > > > >> > > > > > >> > > This design takes care of basic bandwidth restriction, also > takes > > > >> care of > > > >> > > partitions by equally distributing available bandwidth among all > > > >> > > partitions. Also this is open enough to do further modifications > > to > > > >> take > > > >> > > care of complex situations. > > > >> > > > > > >> > > Let me know your opinion on what else we can do to design it > > better. > > > >> > > > > > >> > > -Priyanka > > > >> > > > > > >> > > On Thu, Mar 3, 2016 at 10:11 AM, Sandeep Deshmukh < > > > >> > sandeep@datatorrent.com > > > >> > > > > > > >> > > wrote: > > > >> > > > > > >> > > > The main purpose is not to handle back pressure but to limit > > > >> bandwidth > > > >> > > > usage by applications. This is useful in ingestion use cases. > > > >> Typically > > > >> > > > user needs to ingest say up to 1GB per sec and not more. The > > > tuple > > > >> > size > > > >> > > > may vary based on messages based tuples (few KBs) or block > > tuples > > > >> for > > > >> > > files > > > >> > > > (few MBs). Bandwidth manager will take max bandwidth that can > be > > > >> > utilized > > > >> > > > by the application and will take care of sharing that across > > > >> partitions > > > >> > > > etc. > > > >> > > > > > > >> > > > Priyanka: You could also consider following in your design > > > >> > > > > > > >> > > > 1. Limiting input rate (across partitions) > > > >> > > > 2. Limiting output rate (across partitions) > > > >> > > > 3. Specifying total bandwidth that the Application can > > utilize > > > >> > > including > > > >> > > > input and output? Not sure if this is required. Need > comments > > > >> from > > > >> > > > others > > > >> > > > here. > > > >> > > > 4. Include default implementation that will handle 1 and 2, > > and > > > >> > anyone > > > >> > > > interested in having their own Bandwidth manager should be > > able > > > >> to > > > >> > > > extend > > > >> > > > the default one. > > > >> > > > 5. Can you also look at including/extending tuples per sec > as > > > >> > pointed > > > >> > > > out by Tim/Chinmay. > > > >> > > > > > > >> > > > Regards, > > > >> > > > Sandeep > > > >> > > > > > > >> > > > On Thu, Mar 3, 2016 at 12:23 AM, Timothy Farkas < > > > >> tim@datatorrent.com> > > > >> > > > wrote: > > > >> > > > > > > >> > > > > Not sure if this is helpful, but there is already a utility > in > > > >> Malhar > > > >> > > for > > > >> > > > > converting tuples per second to tuples per window. This > allows > > > the > > > >> > user > > > >> > > > to > > > >> > > > > define a property in tuples per second, then the operator > can > > > >> convert > > > >> > > > that > > > >> > > > > to tuples per window so it emits the correct number of > tuples > > > per > > > >> > > window. > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > https://github.com/apache/incubator-apex-malhar/blob/master/library/src/main/java/com/datatorrent/lib/util/time/WindowUtils.java > > > >> > > > > > > > >> > > > > On Wed, Mar 2, 2016 at 10:41 AM, Chinmay Kolhatkar < > > > >> > > > > chinmay@datatorrent.com> > > > >> > > > > wrote: > > > >> > > > > > > > >> > > > > > Hi Priyanka, > > > >> > > > > > > > > >> > > > > > Indeed this is a useful feature. > > > >> > > > > > > > > >> > > > > > I believe number bytes consumed per sec can as well > > translate > > > to > > > >> > > number > > > >> > > > > of > > > >> > > > > > tuples consumed per sec. > > > >> > > > > > > > > >> > > > > > If above is correct, won't back pressure that is handled > by > > > >> > > > bufferserver > > > >> > > > > > help in your use case? > > > >> > > > > > > > > >> > > > > > Thanks, > > > >> > > > > > Chinmay. > > > >> > > > > > On 2 Mar 2016 4:49 p.m., "Priyanka Gugale" < > > > >> > priyanka@datatorrent.com > > > >> > > > > > > >> > > > > > wrote: > > > >> > > > > > > > > >> > > > > > > Many times we need to put bandwidth restrictions or put > > some > > > >> > limit > > > >> > > on > > > >> > > > > > input > > > >> > > > > > > operator for number of bytes to be consumed per second. > > As I > > > >> > > > understand > > > >> > > > > > in > > > >> > > > > > > Apex there is no direct support for this feature. > > > >> > > > > > > > > > >> > > > > > > I am planning to write a bandwidth manager which will > help > > > in > > > >> > > > limiting > > > >> > > > > > > bandwidth at Input operator. Let me know if there are > any > > > >> better > > > >> > > > > > > alternative ways. I will soon publish design for > Bandwidth > > > >> > Manager > > > >> > > I > > > >> > > > am > > > >> > > > > > > planning to write. > > > >> > > > > > > > > > >> > > > > > > -Priyanka > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > > > > > > > > > --001a113edceaa57503052fb661da--