spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sherry302 <...@git.apache.org>
Subject [GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS
Date Sat, 20 Aug 2016 05:14:39 GMT
Github user Sherry302 commented on the issue:

    https://github.com/apache/spark/pull/14659
  
    Hi, @srowen . Thank you so much for the review. Sorry for the test
    failure and late update. The failure reasons are that ‘jobID’ were
    none or there was no ‘spark.app.name’ in sparkConf. I have updated the PR to set
    default values to ‘jobID’ and ‘spark.app.name’. When a real application runs on
    Spark, it will always have ‘jobID’ and ‘spark.app.name’. 
    
    What's the use case for this?
    When users run Spark applications on Yarn on HDFS, Spark’s
    caller contexts will be written into hdfs-audit.log. The Spark caller contexts
    are JobID_stageID_stageAttemptId_taskID_attemptNumbe and applications’ name. 
    
    The caller context can help users to better diagnose and understand how specific
    applications impacting parts of the Hadoop system and potential problems they
    may be creating (e.g. overloading NN). As HDFS mentioned in HDFS-9184, for a
    given HDFS operation, it's very helpful to track which upper level job issues
    it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message