spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aarondav <...@git.apache.org>
Subject [GitHub] spark pull request: [SPARK-3796] Create external service which can...
Date Sat, 01 Nov 2014 07:05:11 GMT
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/3001#issuecomment-61360218
  
    @rxin Please take a look at my last commit, fd3928b. This is some critical code which
handles a couple cases I describe below.
    
    I have completed performance and correctness testing on a real cluster. Performance-wise,
I saw no regression from the in-executor version. Additionally, I saw minimal memory usage
from the Worker, where I put the server -- I ran several medium-sized shuffles using Workers
with 512MB max heap sizes (the default) without noticeable garbage collection or heap growth.
    
    During testing, I noticed that we were dropping map outputs if I killed an executor in
the middle of a map or reduce phase (in local testing, my queries ran so quickly that I always
killed the executor after the completion of the job). This caused us to unnecessarily recompute
the map tasks. I have added code which only drops map outputs if (1) external shuffle is disabled
or (2) we're responding to a fetch failure specifically.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message