flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Alexandrov <alexander.s.alexand...@gmail.com>
Subject Re: Broadcast data sent increases with # slots per TM
Date Wed, 08 Jun 2016 19:14:07 GMT
> As far as I know, the reason why the broadcast variables are implemented
that way is that the senders would have to know which sub-tasks are
deployed to which TMs.

As the broadcast variables are realized as additionally attached "broadcast
channels", I am assuming that the same behavior will apply for broadcast
joins as well.

Is this the case?

Regards,
Alexander


2016-06-08 17:13 GMT+02:00 Kunft, Andreas <andreas.kunft@tu-berlin.de>:

> Hi Till,
>
> thanks for the fast answer.
> I'll think about a concrete way of implementing and open an JIRA.
>
> Best
> Andreas
> ________________________________________
> Von: Till Rohrmann <trohrmann@apache.org>
> Gesendet: Mittwoch, 8. Juni 2016 15:53
> An: dev@flink.apache.org
> Betreff: Re: Broadcast data sent increases with # slots per TM
>
> Hi Andreas,
>
> your observation is correct. The data is sent to each slot and the
> receiving TM only materializes one copy of the data. The rest of the data
> is discarded.
>
> As far as I know, the reason why the broadcast variables are implemented
> that way is that the senders would have to know which sub-tasks are
> deployed to which TMs. Only then, you can decide for which sub-tasks you
> can send the data together. Since the output emitters are agnostic to the
> actual deployment, the necessary information would have to be forwarded to
> them.
>
> Another problem is that if you pick one of the sub-tasks to receive the
> broadcast set, then you have to make sure, that this sub-task has read and
> materialized the broadcast set before the other sub-tasks start working.
> One could maybe send to one sub-task first the broadcast set and then to
> all other sub-tasks, after one has sent the BC set, a kind of acknowledge
> record. That way, the other sub-tasks would block until the broadcast set
> has been completely transmitted. But here one has to make sure that the
> sub-task receiving the BC set has been deployed and is not queued up for
> scheduling.
>
> So there are some challenges to solve in order to optimize the BC sets.
> Currently, there is nobody working on it. If you want to start working on
> it, then I would recommend to open a JIRA and start writing a design
> document for it.
>
> Cheers,
> Till
>
> On Wed, Jun 8, 2016 at 1:45 PM, Kunft, Andreas <andreas.kunft@tu-berlin.de
> >
> wrote:
>
> > Hi,
> >
> >
> > we experience some unexpected increase of data sent over the network for
> > broadcasts with increasing number of slots per Taskmanager.
> >
> >
> > We provided a benchmark [1]. It not only increases the size of data sent
> > over the network but also hurts performance as seen in the preliminary
> > results below. In this results cloud-11 has 25 nodes and ibm-power has 8
> > nodes with scaling the number of slots per node from 1 - 16.
> >
> >
> > +-----------------------+--------------+-------------+
> > | suite                 | name         | median_time |
> > +=======================+==============+=============+
> > | broadcast.cloud-11    | broadcast.01 |        8796 |
> > | broadcast.cloud-11    | broadcast.02 |       14802 |
> > | broadcast.cloud-11    | broadcast.04 |       30173 |
> > | broadcast.cloud-11    | broadcast.08 |       56936 |
> > | broadcast.cloud-11    | broadcast.16 |      117507 |
> > | broadcast.ibm-power-1 | broadcast.01 |        6807 |
> > | broadcast.ibm-power-1 | broadcast.02 |        8443 |
> > | broadcast.ibm-power-1 | broadcast.04 |       11823 |
> > | broadcast.ibm-power-1 | broadcast.08 |       21655 |
> > | broadcast.ibm-power-1 | broadcast.16 |       37426 |
> > +-----------------------+--------------+-------------+
> >
> >
> >
> > After looking into the code base it, it seems that the data is
> > de-serialized only once per TM, but the actual data is sent for all slots
> > running the operator with broadcast vars and just gets discarded in case
> > its already de-serialized.
> >
> >
> > I do not see a reason the data can't be shared among the slots of a TM
> and
> > therefore just sent once, but I guess it would require quite some changes
> > bc sets are handled currently.
> >
> >
> > Are there any future plans regarding this and/or is there interest in
> this
> > "feature"?
> >
> >
> > Best
> >
> > Andreas?
> >
> >
> > [1] https://github.com/TU-Berlin-DIMA/flink-broadcast?
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message