spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Bozarth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-23237) Add UI / endpoint for threaddumps for executors with active tasks
Date Tue, 30 Jan 2018 21:11:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-23237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345818#comment-16345818
] 

Alex Bozarth commented on SPARK-23237:
--------------------------------------

I would rather keep it to an api endpoint, but what I'm worried about is having an end point
that returns a specific threadDump decided by some unknown algorithm. Again, I'm willing to
look at a PR to see if the exact impl will change my mind.

> Add UI / endpoint for threaddumps for executors with active tasks
> -----------------------------------------------------------------
>
>                 Key: SPARK-23237
>                 URL: https://issues.apache.org/jira/browse/SPARK-23237
>             Project: Spark
>          Issue Type: New Feature
>          Components: Web UI
>    Affects Versions: 2.3.0
>            Reporter: Imran Rashid
>            Priority: Major
>
> Frequently, when there are a handful of straggler tasks, users want to know what is going
on in those executors running the stragglers.  Currently, that is a bit of a pain to do: you
have to go to the page for your active stage, find the task, figure out which executor its
on, then go to the executors page, and get the thread dump.  Or maybe you just go to the executors
page, find the executor with an active task, and then click on that, but that doesn't work
if you've got multiple stages running.
> Users could figure this by extracting the info from the stage rest endpoint, but it's
such a common thing to do that we should make it easy.
> I realize that figuring out a good way to do this is a little tricky.  We don't want
to make it easy to end up pulling thread dumps from 1000 executors back to the driver.  So
we've got to come up with a reasonable heuristic for choosing which executors to poll.  And
we've also got to find a suitable place to put this.
> My suggestion is that the stage page always has a link to the thread dumps for the *one*
executor with the longest running task.  And there would be a corresponding endpoint in the
rest api with the same info, maybe at {{/applications/[app-id]/stages/[stage-id]/[stage-attempt-id]/slowestTaskThreadDump}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message