hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [hadoop] steveloughran edited a comment on pull request #2323: HADOOP-16830. Add public IOStatistics API.
Date Mon, 28 Sep 2020 15:06:39 GMT

steveloughran edited a comment on pull request #2323:
URL: https://github.com/apache/hadoop/pull/2323#issuecomment-700065247


   > For DurationTrackers in IOStatisticsStore() if we add a tracker in a try block, what
happens to it in case of failure should be looked at to avoid inaccurate values for the trackers.
   
   I was thinking about failure reporting myself
   
   - we may want to count failures
   - any failure which with a longer or shorter duration than successful operations Will skew
the results. Example: network failures -> long durations; auth failures -> short ones.
   
   At the same time, try-with-resources is nice. What to do?
   
   For each set of duration stats, we add counter/mean/min/max of failures
   on a failure, those statistics are updated instead.
   
   Issue: how best to record a failure, given we can't get at the try-with-resources classes
in catch or finally? I'd initially thought we could set it in the catch(), but it'd be out
of scope.
   
   1. Pessimistic: assume that all attempts are failure, make last operation in every try
clause set the success flag. Ugly.
   1. Move construction out of try-with-resources and instead explicit catch and finally.
Differently ugly
   
   Fancy lambda-expression wrapper thing? Doable.
   
   ```
   object = DurationTrackerFactory.track("statistic", () ->
     s3.listObjects());
   ```
     
   Then in that code we'd put the code of option #2 in
   
   Fancy curried-function-Haskell-elitism option
   
   Duration tracker takes a function and returns a new one
   
   ```
   FunctionRaisingIOE<A, B> track(String, FunctionRaisingIOE<A, B> inner)
   ```
   
   you'd get a function back which you could then apply at leisure.
   
   ```
   DurationTrackerFactory.track("statistic", () ->
     s3.listObjects()).apply();
   ```
     
   Maybe worth doing both. I could also look at adding into the S3A Invoke code, as every
iteration of a retried operation we'd want the statistic updated. 
   
   ```
   invoker.invoke(()-> durationTracker.track("listings",  () -> s3a.list()
   ```
   
   That is where curried functions come out to play, with something like
   
   invoker.invoke(durationTracker.track("listings",  () -> s3a.list()). apply
   
   At the same time: this gets complex fast. Could we make the design of this a followup?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message