pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Praveen R <prav...@sigmoidanalytics.com>
Subject Pig on Spark
Date Tue, 15 Jul 2014 14:36:04 GMT
Hi Everyone,

We, at SigmoidAnalytics have been working on pig on spark for sometime and
would like to hear your thoughts about it.

You can find the repo at here: https://github.com/sigmoidanalytics/spork and
the README has been updated to work with Spark 0.9. We have currently
tested it on hadoop-1.0.4 and hadoop-2.2.0.

Below are some major issues we are having:
1. Send objects from driver to executors, we have built at tcp server to
broadcast
<https://github.com/sigmoidanalytics/spork/commit/b35b57d94c9b0b4dfdf165b30ba8145f65975f23>
data to executors to achieve this.
2. Large shuffle data when performing groupBy.

Please feel free to file issues on the github repo or mail us at:
spark@sigmoidanalytics.com.

Thanks,
Praveen R

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message