hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-hudi] prashantwason commented on issue #1289: [HUDI-92] Provide reasonable names for Spark DAG stages in Hudi.
Date Wed, 29 Jan 2020 20:28:34 GMT
prashantwason commented on issue #1289: [HUDI-92] Provide reasonable names for Spark DAG stages
in Hudi.
URL: https://github.com/apache/incubator-hudi/pull/1289#issuecomment-579944539
 
 
   A DAG stage name and description can be set using the JavaSparkContext.setJobDescription(...)
method. The same name/description is used for all stages which use the same thread until the
name/description is updated (another call to setJobDescription) or deleted (clearJobGroup).
   
   In this PR, I am using the ClassName as the stage name and a textual description derived
from the method logic. HUDI classes have very descriptive names so this works well.
   
   There are two ways this may be done:
   1. Manually (this PR) by adding code set the name/description before any DAG stages are
started. 
   2. Using Java AOP to automatically find code locations matching some pattern and augment
them with the call to setJobDescription. 
   
   To use AOP approach, we can create a separate AspectJ file containing the Pointcuts (code
locations to augment) and Advices (code to insert). There is a separate AspectJ compiler which
at runtime can change the class bytecode to add the Advices. 
   
   Pros of AOP approach:
   1. Does not require any change in current code
   2. Also covers future code automatically
   3. Easy to undo (just don't run the AspectJ compiler as part of build)
   4. Can be extended to more use-cases like automating Metrics.
   
   Cons of AOP approach:
   1. Require AspectJ and its compiler to be integrated with the HUDI build chain
   2. The Advice cannot be dynamic. Hence we cannot provide descriptions to the DAG stages
(we can still use the class name as the DAG stage name). 
   
   Since the code has a manageable number of places where DAG is created, I prefer the simpler
manual approach. It also ends up documenting the code.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message