beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aviem Zur (JIRA)" <>
Subject [jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events
Date Sun, 11 Dec 2016 12:48:58 GMT


Aviem Zur commented on BEAM-1126:

The backlog accessors are a very good indicator for application monitoring. As such, we plan
to expose backlog as aggregators in spark-runner. 
Number of events is more human comprehensible than bytes. Specifically, in Kafka, backlog
(or lag) is reasoned about in {{number of messages}}. See:
If I understand correctly, for {{PubSub}} it is more common to reason about backlog in bytes,
however, the implementation for {{KafkaIO}} seems forced, applying a byte approximation on
a value that is originally in {{number of messages}}:
synchronized long approxBacklogInBytes() {
  // Note that is an an estimate of uncompressed backlog.
  if (latestOffset < 0 || nextOffset < 0) {
    return UnboundedReader.BACKLOG_UNKNOWN;
  return Math.max(0, (long) ((latestOffset - nextOffset) * avgRecordSize));
In conclusion - it seems that the API was written with {{PubSub}} in mind, however, {{Kafka}},
the open source equivalent, relates to backlog in terms of {{number of messages}}. 

> Expose UnboundedSource split backlog in number of events
> --------------------------------------------------------
>                 Key: BEAM-1126
>                 URL:
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: Aviem Zur
>            Assignee: Daniel Halperin
>            Priority: Minor
> Today {{UnboundedSource}} exposes split backlog in bytes via {{getSplitBacklogBytes()}}
> There is value in exposing backlog in number of events as well, since this number can
be more human comprehensible than bytes. something like {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}.

This message was sent by Atlassian JIRA

View raw message