spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From squito <>
Subject [GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...
Date Thu, 19 Jan 2017 16:10:16 GMT
Github user squito commented on the issue:
    This might be a really dumb question, but please humor me -- can you explain to me the
threads involved, both before and after this change?  I must be missing something, because
it seems to me it will be the same thread both before and after this change.
    `taskScheduler.resourceOffers()` is only called by `CoarseGrainedSchedulerBackend`, inside
its event loop.  `taskScheduler` has a handful of other threads that interact with it, but
nothing that will trigger resource offers.  Eg. some methods end up calling `backend.reviveOffers()`,
but that pushes an event into the `CoarseGrainedSchedulerBackend`'s event queue anyway.
    OTOH, though this PR doesn't change the thread that serialization is happening, after
this change serialization happens *without a lock* on the `taskScheduler`.  This could be
important because when a task finishes, we need to both (1) update the taskset's internal
state & process the result and (2) schedule new tasks on the freed resources.  Those two
can happen in different threads, but both involve locks on `taskScheduler`; by moving serialization
so it no longer requires a lock, this reduces contention on that lock and improves performance.
    Is that accurate, or am I completely missing something?

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message