spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyukjin Kwon (JIRA)" <>
Subject [jira] [Commented] (SPARK-21493) Add more metrics to External Shuffle Service
Date Mon, 24 Jul 2017 01:56:00 GMT


Hyukjin Kwon commented on SPARK-21493:

gentle ping [~raajay]

> Add more metrics to External Shuffle Service
> --------------------------------------------
>                 Key: SPARK-21493
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: Raajay Viswanathan
>            Priority: Minor
>   Original Estimate: 336h
>  Remaining Estimate: 336h
> The current set of metrics in the external shuffle service are fairly limited. To debug
failure of the shuffle service, it would be good to get more information regarding the state
of the shuffle service. As a first cut, the following metrics seem important:
> 1. The amount of heap memory used by the External Shuffle Service process
> 2. The amount of direct buffer (off-heap) memory allocated to External Shuffle Service.
In the external shuffle service, Netty uses off-heap memory. Monitoring its usage can help
in allocating appropriate resources and can also be used to raise alarms when the allocated
memory exceeds a threshold.
> 3. The queue length in Netty event loops. Chunk Fetch Requests (or) Open Block requests
can be dropped as a result of Netty queue overflows (resulting in FetchFailure). Having hard
data on queue size can help in attributing cause of failures.
> Please let me know of other metrics (from Shuffle Service perspective) that would be
good to have. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message