www-infrastructure-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mengxr <...@git.apache.org>
Subject [GitHub] incubator-spark pull request: SPARK-1122: allCollect functions for...
Date Tue, 25 Feb 2014 02:42:43 GMT
Github user mengxr commented on the pull request:

    @JoshRosen I didn't implement this ... AllCollect is better than directly using broadcast
variables if it is implemented without putting heavy load on the driver and data is not very
small. But the current implementation is no better than directly broadcasting variables. A
slightly better solution would be shuffle-based, which does not put load on the driver but
it might create duplicate blocks at the same physical node. The efficient broadcasting you
described sounds interesting, but yes I can imagine it is difficult to implement. Thanks for
sharing your thoughts!

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message