heron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fu Maosong <maoson...@gmail.com>
Subject Re: Specifying Operator Resource in DSL
Date Fri, 22 Sep 2017 00:34:44 GMT
Bill,

For summingbird/TSAR, they are only allowed to specify resources at
source/map/sink granularity. In other words, all sources(or maps/sinks)
need to share the same config.

Definitely they can do "reverse engineer the component name out of the
derived topology and then pass settings for it", but by doing this they are
using Heron API directly and rely on something not guaranteed (for
instance, the naming conventions for summingbird components), which is not
ideal.

2017-09-21 16:04 GMT-07:00 Sanjeev Kulkarni <sanjeevrk@gmail.com>:

> Neng,
> https://github.com/twitter/heron/pull/2334
> provides this abstraction.
> The issue however is the follows. In Spout/Bolt world, every component is
> explicitly named by the topology writer and thus all resources can be
> specified on a per component basis. However in the dsl world, a) the
> operators themselves dont have name and b) optimizations can squish the
> operators into single physical operator. One possibility would be to add a
> name optionally to the operator(like map(mapfn, name), but that seems too
> cumbersome/kludgy)
>
> On Thu, Sep 21, 2017 at 3:57 PM, Neng Lu <freeneng@gmail.com> wrote:
>
> > Just add some thoughts here: for ordinary heron topologies, the
> definition
> > of a heron job and the request of resources usage for each component are
> > separated: `TopologyBuilder` for job definition, `Config` for resource
> > requirement.
> >
> > In the dsl case, if we could also do something similar that separates the
> > dsl job creation and resources request, it would be really good. With
> this
> > separation, people has the flexibility of providing different configs for
> > the same job.
> >
> >
> > On Wed, Sep 20, 2017 at 1:48 PM, Sanjeev Kulkarni <sanjeevrk@gmail.com>
> > wrote:
> >
> > > Hi folks,
> > > One of the great features of the lower level spout/bolt interface in
> > Heron
> > > is the ability to specify resources needed on a per component basis.
> This
> > > feature is very helpful for tuning large topologies and is heavily used
> > > inside Twitter.
> > > Currently the DSL does not have this flexibility. I wanted to get
> > opinions
> > > about how we can add this.
> > > There are probably several ways to do it. I'm listing a few approaches
> > that
> > > have come to my mind. Please feel free to add more.
> > > 1) Currently some of our operators are simple(like flatMap, map, filter
> > > operators), others are a little complicated(like transform where users
> > can
> > > perform setup/cleanup). We can take the approach of adding the ability
> to
> > > specify resources only for complex operators. Thus transform could have
> > two
> > > variants. The current one which just takes a transform function and
> > another
> > > that takes in a resource parameter as well. The rest of other
> > > operators(map/flatmap/filter, etc) will remain the same. The advantage
> of
> > > this is that the interface explosion is minimal and controlled. The
> cons
> > is
> > > that if you need to control the resources of a particular operator, you
> > are
> > > forced to use transform.
> > > 2) Another approach would be to add a variant that takes in a Resource
> > > parameter to all operators. Pros is that this gives fine grained
> control
> > to
> > > all operators. Cons is the interface blow up.
> > >
> > > Thoughts?
> > >
> >
>



-- 
With my best Regards
------------------
Fu Maosong
Twitter Inc.
Mobile: +001-415-244-7520

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message