spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [spark] HeartSaVioR edited a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog
Date Thu, 20 Feb 2020 22:00:00 GMT
HeartSaVioR edited a comment on issue #27557: [SPARK-30804][SS] Measure and log elapsed time
for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-589333368
 
 
   >> For streaming workloads, latency is the first class consideration.
   
   >  When the query is not running properly.
   
   OK I admit my major experience had been with "low-latency", but even Spark runs with "micro-batch",
it doesn't mean latency is not important. The latency is the thing in streaming workload to
"define" whether the query is running properly or not. Even Spark had to claim that a micro-batch
could run in sub-second because one of major downside for Spark Streaming has been the latency,
and continuous processing had to be introduced.
   
   Higher latency doesn't only mean output will be delayed. When you turn on "latestFirst"
(with maxFilesPerTrigger, as this case we assume we can't process all the inputs) to start
reading from latest files, then the latency on a batch defines the boundary of inputs.
   
   It's a critical aspect which operators should always observe via their monitoring approaches
(alerts, time-series DB and dashboard, etc.), and find out what happens when the latency fluctuates
a lot. 
   
   > I think it's debug information which helps developers to find out what's the issue
and not users (INFO is more like to users in my understanding).
   
   I'm not sure who do you mean by "users". AFAIK, in many cases (not all cases for sure),
users = developers = operators.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message