beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (BEAM-3042) Add tracking of bytes read / time spent when reading side inputs
Date Tue, 12 Dec 2017 00:06:00 GMT


ASF GitHub Bot commented on BEAM-3042:

aaltay closed pull request #4241: [BEAM-3042] Renaming properties form IO target counter name.

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/utils/ b/sdks/python/apache_beam/utils/
index ae974344259..e2e0a1a730b 100644
--- a/sdks/python/apache_beam/utils/
+++ b/sdks/python/apache_beam/utils/
@@ -29,19 +29,18 @@
 from apache_beam.transforms import cy_combiners
 # Information identifying the IO being measured by a counter.
-IOTargetName = namedtuple('IOTargetName', ['side_input_step_name',
-                                           'side_input_index',
-                                           'original_shuffle_step_name'])
+IOTargetName = namedtuple('IOTargetName', ['requesting_step_name',
+                                           'input_index'])
 def side_input_id(step_name, input_index):
   """Create an IOTargetName that identifies the reading of a side input."""
-  return IOTargetName(step_name, input_index, None)
+  return IOTargetName(step_name, input_index)
 def shuffle_id(step_name):
   """Create an IOTargetName that identifies a GBK step."""
-  return IOTargetName(None, None, step_name)
+  return IOTargetName(step_name, None)
 _CounterName = namedtuple('_CounterName', ['name',


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

> Add tracking of bytes read / time spent when reading side inputs
> ----------------------------------------------------------------
>                 Key: BEAM-3042
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Pablo Estrada
>            Assignee: Pablo Estrada
> It is difficult for Dataflow users to understand how modifying a pipeline or data set
can affect how much inter-transform IO is used in their job. The intent of this feature request
is to help users understand how side inputs behave when they are consumed.
> This will allow users to understand how much time and how much data their pipeline uses
to read/write to inter-transform IO. Users will also be able to modify their pipelines and
understand how their changes affect these IO metrics.
> For further information, please review the internal Google doc go/insights-transform-io-design-doc.

This message was sent by Atlassian JIRA

View raw message