apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Priyanka Gugale <priya...@datatorrent.com>
Subject Re: Bandwidth control for Input operators in Apex
Date Tue, 05 Apr 2016 05:47:32 GMT
Okay so I will open the pull request soon.

-Priyanka

On Wed, Mar 23, 2016 at 1:35 PM, Yogi Devendra <devendra.vyavahare@gmail.com
> wrote:

> This looks OK. Let us build it incrementally.
>
> ~ Yogi
>
> On 23 March 2016 at 13:24, Sandeep Deshmukh <sandeep@datatorrent.com>
> wrote:
>
> > I would suggest that we go ahead with design as suggested by Priyanka
> where
> > we have bandwidth setup for each operator separately. We can later extend
> > this for bandwidth to be shared with different input operators or for the
> > DAG as a whole.
> >
> > Regards,
> > Sandeep
> >
> > On Wed, Mar 23, 2016 at 11:51 AM, Priyanka Gugale <
> > priyanka@datatorrent.com>
> > wrote:
> >
> > > Right now it's not for output operator, but one can very well use
> > bandwidth
> > > manager to keep track of bandwidth usage and limit your output speed.
> The
> > > bigger challenge there would be, you won't be able to process window
> data
> > > sent by upstream operator in same window. For that you need to do more
> > than
> > > just bandwidth control.
> > > So I would say, bandwidth control feature can be used as it is for
> output
> > > operator as well, only we need to do more than just bandwidth
> limitation
> > > for output operators.
> > >
> > > -Priyanka
> > >
> > > On Wed, Mar 23, 2016 at 11:47 AM, Priyanka Gugale <
> > > priyanka@datatorrent.com>
> > > wrote:
> > >
> > > > That's a good question Chaitanya, Right now the bandwidth control is
> at
> > > > Input Operator level and not application level. So if you have two
> > input
> > > > operator you need to set bandwidth on both separately by this design.
> > > > May be it would be good to have bandwidth control at Application
> level
> > > > than operator level. Let me think if I can modify the design to do
> > that.
> > > If
> > > > you have any ideas for same, please share them.
> > > >
> > > > -Priyanka
> > > >
> > > > On Wed, Mar 23, 2016 at 11:47 AM, Yogi Devendra <
> > > > devendra.vyavahare@gmail.com> wrote:
> > > >
> > > >> Priyanka,
> > > >>
> > > >> From the design description it is not clear how it will be used to
> > > control
> > > >> output bandwidth (point #2,3,4 mentioned by Sandeep)
> > > >>
> > > >> ~ Yogi
> > > >>
> > > >> On 23 March 2016 at 11:39, Chaitanya Chebolu <
> > chaitanya@datatorrent.com
> > > >
> > > >> wrote:
> > > >>
> > > >> > This is very useful feature.
> > > >> > I would like to know, how you are distributing the bandwidth
for
> the
> > > >> below
> > > >> > situation:
> > > >> > - Two input operators say i1 and i2 are deployed on same node
and
> > both
> > > >> the
> > > >> > operators have bandwidthManager as plugin.
> > > >> >
> > > >> > On Fri, Mar 18, 2016 at 5:43 PM, Priyanka Gugale <
> > > >> priyanka@datatorrent.com
> > > >> > >
> > > >> > wrote:
> > > >> >
> > > >> > > Hi,
> > > >> > >
> > > >> > > Thanks for inputs Sandeep, would take care of those points.
> > > >> > >
> > > >> > > Here is high level design we are considering, We would have
> > > following
> > > >> > > components:
> > > >> > > *1.* *BandwidthManager*
> > > >> > > This keeps track of current bandwidth usage of system and
takes
> > > >> decision
> > > >> > if
> > > >> > > requested data bandwidth can be used right away or not.
To do
> this
> > > it
> > > >> > > used Leaky
> > > >> > > bucket <https://en.wikipedia.org/wiki/Leaky_bucket>
algorithm
> > where
> > > >> it
> > > >> > > emits data as long as it has not overused bandwidth (i.e.
> > bandwidth
> > > >> > > consumption is >=0) and then wait to accumulate bandwidth
for a
> > > while
> > > >> > (till
> > > >> > > bandwidth goes from -ve value to +ve).
> > > >> > >
> > > >> > > *2. BandwidthLimitingInputOperator*
> > > >> > > Any Input operator which want to implement bandwidth restriction
> > > >> should
> > > >> > > implement BandwidthLimitingInputOperator. The operator have
> > abstract
> > > >> > method
> > > >> > >  to initialize instance of BandwidthManager and a method
to emit
> > > tuple
> > > >> > with
> > > >> > > bandwidth restriction to emit tuples as per available bandwidth.
> > > >> > >
> > > >> > > *3. BandwidthPartitioner*
> > > >> > > Bandwidth partitioner is introduced for static partitioning.
If
> > > static
> > > >> > > partitioning is used by default StatelessPartitioner class
is
> > > >> > initialized.
> > > >> > > With bandwidth restriction we want to equally divide bandwidth
> > > amongst
> > > >> > > available partitions. BandwidthPartitioner should take care
of
> it.
> > > It
> > > >> > > extends StatelessPartitioner, it just sets right bandwidth
on
> all
> > > >> > > partitions after StatelessPartitioner creates/deletes
> partitiolns.
> > > In
> > > >> > case
> > > >> > > of dynamic partitioning the operator implementing
> > definePartitions,
> > > >> > should
> > > >> > > take care of bandwidth distribution.
> > > >> > >
> > > >> > > This design takes care of basic bandwidth restriction, also
> takes
> > > >> care of
> > > >> > > partitions by equally distributing available bandwidth among
all
> > > >> > > partitions. Also this is open enough to do further modifications
> > to
> > > >> take
> > > >> > > care of complex situations.
> > > >> > >
> > > >> > > Let me know your opinion on what else we can do to design
it
> > better.
> > > >> > >
> > > >> > > -Priyanka
> > > >> > >
> > > >> > > On Thu, Mar 3, 2016 at 10:11 AM, Sandeep Deshmukh <
> > > >> > sandeep@datatorrent.com
> > > >> > > >
> > > >> > > wrote:
> > > >> > >
> > > >> > > > The main purpose is not to handle back pressure but
to limit
> > > >> bandwidth
> > > >> > > > usage by applications. This is useful in ingestion
use cases.
> > > >> Typically
> > > >> > > > user needs to ingest say up to  1GB per sec and not
more. The
> > > tuple
> > > >> > size
> > > >> > > > may vary based on messages based tuples (few KBs) or
block
> > tuples
> > > >> for
> > > >> > > files
> > > >> > > > (few MBs). Bandwidth manager will take max bandwidth
that can
> be
> > > >> > utilized
> > > >> > > > by the application and will take care of sharing that
across
> > > >> partitions
> > > >> > > > etc.
> > > >> > > >
> > > >> > > > Priyanka: You could also consider following in your
design
> > > >> > > >
> > > >> > > >    1. Limiting input rate (across partitions)
> > > >> > > >    2. Limiting output rate (across partitions)
> > > >> > > >    3. Specifying total bandwidth that the Application
can
> > utilize
> > > >> > > including
> > > >> > > >    input and output? Not sure if this is required.
Need
> comments
> > > >> from
> > > >> > > > others
> > > >> > > >    here.
> > > >> > > >    4. Include default implementation that will handle
1 and 2,
> > and
> > > >> > anyone
> > > >> > > >    interested in having their own Bandwidth manager
should be
> > able
> > > >> to
> > > >> > > > extend
> > > >> > > >    the default one.
> > > >> > > >    5. Can you also look at including/extending tuples
per sec
> as
> > > >> > pointed
> > > >> > > >    out by Tim/Chinmay.
> > > >> > > >
> > > >> > > > Regards,
> > > >> > > > Sandeep
> > > >> > > >
> > > >> > > > On Thu, Mar 3, 2016 at 12:23 AM, Timothy Farkas <
> > > >> tim@datatorrent.com>
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > > > Not sure if this is helpful, but there is already
a utility
> in
> > > >> Malhar
> > > >> > > for
> > > >> > > > > converting tuples per second to tuples per window.
This
> allows
> > > the
> > > >> > user
> > > >> > > > to
> > > >> > > > > define a property in tuples per second, then the
operator
> can
> > > >> convert
> > > >> > > > that
> > > >> > > > > to tuples per window so it emits the correct number
of
> tuples
> > > per
> > > >> > > window.
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/apache/incubator-apex-malhar/blob/master/library/src/main/java/com/datatorrent/lib/util/time/WindowUtils.java
> > > >> > > > >
> > > >> > > > > On Wed, Mar 2, 2016 at 10:41 AM, Chinmay Kolhatkar
<
> > > >> > > > > chinmay@datatorrent.com>
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > Hi Priyanka,
> > > >> > > > > >
> > > >> > > > > > Indeed this is a useful feature.
> > > >> > > > > >
> > > >> > > > > > I believe number bytes consumed per sec can
as well
> > translate
> > > to
> > > >> > > number
> > > >> > > > > of
> > > >> > > > > > tuples consumed per sec.
> > > >> > > > > >
> > > >> > > > > > If above is correct, won't back pressure
that is handled
> by
> > > >> > > > bufferserver
> > > >> > > > > > help in your use case?
> > > >> > > > > >
> > > >> > > > > > Thanks,
> > > >> > > > > > Chinmay.
> > > >> > > > > > On 2 Mar 2016 4:49 p.m., "Priyanka Gugale"
<
> > > >> > priyanka@datatorrent.com
> > > >> > > >
> > > >> > > > > > wrote:
> > > >> > > > > >
> > > >> > > > > > > Many times we need to put bandwidth
restrictions or put
> > some
> > > >> > limit
> > > >> > > on
> > > >> > > > > > input
> > > >> > > > > > > operator for number of bytes to be consumed
per second.
> > As I
> > > >> > > > understand
> > > >> > > > > > in
> > > >> > > > > > > Apex there is no direct support for
this feature.
> > > >> > > > > > >
> > > >> > > > > > > I am planning to write a bandwidth manager
which will
> help
> > > in
> > > >> > > > limiting
> > > >> > > > > > > bandwidth at Input operator. Let me
know if there are
> any
> > > >> better
> > > >> > > > > > > alternative ways. I will soon publish
design for
> Bandwidth
> > > >> > Manager
> > > >> > > I
> > > >> > > > am
> > > >> > > > > > > planning to write.
> > > >> > > > > > >
> > > >> > > > > > > -Priyanka
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message