flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-10252) Handle oversized metric messges
Date Thu, 18 Oct 2018 13:02:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-10252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655202#comment-16655202
] 

ASF GitHub Bot commented on FLINK-10252:
----------------------------------------

zentol commented on a change in pull request #6850: [FLINK-10252] Handle oversized metric
messges
URL: https://github.com/apache/flink/pull/6850#discussion_r226294457
 
 

 ##########
 File path: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/dump/MetricDumpSerialization.java
 ##########
 @@ -124,50 +124,86 @@ public MetricSerializationResult serialize(
 			Map<Counter, Tuple2<QueryScopeInfo, String>> counters,
 			Map<Gauge<?>, Tuple2<QueryScopeInfo, String>> gauges,
 			Map<Histogram, Tuple2<QueryScopeInfo, String>> histograms,
-			Map<Meter, Tuple2<QueryScopeInfo, String>> meters) {
+			Map<Meter, Tuple2<QueryScopeInfo, String>> meters,
+			long maximumFramesize,
+			MetricQueryService queryService) {
 
 			buffer.clear();
+			boolean unregisterRemainingMetrics = false;
 
 			int numCounters = 0;
 			for (Map.Entry<Counter, Tuple2<QueryScopeInfo, String>> entry : counters.entrySet())
{
+				if (unregisterRemainingMetrics) {
+					queryService.unregister(entry.getKey());
+					continue;
+				}
+
 				try {
 					serializeCounter(buffer, entry.getValue().f0, entry.getValue().f1, entry.getKey());
 					numCounters++;
+					if (buffer.length() > maximumFramesize) {
+						unregisterRemainingMetrics = true;
+					}
 				} catch (Exception e) {
 					LOG.debug("Failed to serialize counter.", e);
+
 				}
 			}
 
 			int numGauges = 0;
 			for (Map.Entry<Gauge<?>, Tuple2<QueryScopeInfo, String>> entry : gauges.entrySet())
{
+				if (unregisterRemainingMetrics) {
+					queryService.unregister(entry.getKey());
+					continue;
+				}
+
 				try {
 					serializeGauge(buffer, entry.getValue().f0, entry.getValue().f1, entry.getKey());
 					numGauges++;
+					if (buffer.length() > maximumFramesize) {
+						unregisterRemainingMetrics = true;
+					}
 				} catch (Exception e) {
 					LOG.debug("Failed to serialize gauge.", e);
 				}
 			}
 
-			int numHistograms = 0;
-			for (Map.Entry<Histogram, Tuple2<QueryScopeInfo, String>> entry : histograms.entrySet())
{
-				try {
-					serializeHistogram(buffer, entry.getValue().f0, entry.getValue().f1, entry.getKey());
-					numHistograms++;
-				} catch (Exception e) {
-					LOG.debug("Failed to serialize histogram.", e);
-				}
-			}
-
 			int numMeters = 0;
 			for (Map.Entry<Meter, Tuple2<QueryScopeInfo, String>> entry : meters.entrySet())
{
+				if (unregisterRemainingMetrics) {
+					queryService.unregister(entry.getKey());
+					continue;
+				}
+
 				try {
 					serializeMeter(buffer, entry.getValue().f0, entry.getValue().f1, entry.getKey());
 					numMeters++;
+					if (buffer.length() > maximumFramesize) {
+						unregisterRemainingMetrics = true;
+					}
 				} catch (Exception e) {
 					LOG.debug("Failed to serialize meter.", e);
 				}
 			}
 
+			int numHistograms = 0;
+			for (Map.Entry<Histogram, Tuple2<QueryScopeInfo, String>> entry : histograms.entrySet())
{
+				if (unregisterRemainingMetrics) {
+					queryService.unregister(entry.getKey());
+					continue;
+				}
+
+				try {
+					serializeHistogram(buffer, entry.getValue().f0, entry.getValue().f1, entry.getKey());
+					numHistograms++;
+					if (buffer.length() > maximumFramesize) {
 
 Review comment:
   > OK, I think if we adopt the strategy of throwing exception, this is the same as we
judge the total size directly in MQS.
   
   That is correct.
   
   > probability of size overflow
   
   Detailed latency metrics are the only known case where this happened which is why I'm so
persistent in suggesting to drop histograms first as it eliminates this case.
   
   > we need to consider returning some of the metrics
   
   When you separate the byte array for each type we can always run the full serialization,
and let the MQS drop histograms from the serialization result without requiring re-serialization.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Handle oversized metric messges
> -------------------------------
>
>                 Key: FLINK-10252
>                 URL: https://issues.apache.org/jira/browse/FLINK-10252
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Metrics
>    Affects Versions: 1.5.3, 1.6.0, 1.7.0
>            Reporter: Till Rohrmann
>            Assignee: vinoyang
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.5.6, 1.6.3, 1.7.0
>
>
> Since the {{MetricQueryService}} is implemented as an Akka actor, it can only send messages
of a smaller size then the current {{akka.framesize}}. We should check similarly to FLINK-10251
whether the payload exceeds the maximum framesize and fail fast if it is true.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message