spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cmccabe <>
Subject [GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...
Date Wed, 27 Aug 2014 22:08:39 GMT
Github user cmccabe commented on the pull request:
    Hi all.  I'm uploading a new rev with Sandy's comments.  I also took a stab at implementing
delay scheduling for HDFS-cached data, but the patch got a little bigger than I would like,
since it involved many changes to TaskSetManager.  I think the best thing to do is to get
this change in now and then work on the delay scheduling part in a follow-up JIRA.
    I changed references to "cached" to "inmemory" to avoid confusion.  We already call a
lot of things "cached" because they're in memory in the executors.  I also think that eventually
we may want to extend PartitionLocation to take into account other things like whether the
location is on SSD (faster) or archival storage (slower), so I made PartitionPriority an enum.

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message