spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aniket Mokashi <aniket...@gmail.com>
Subject Re: Pig on Spark
Date Thu, 06 Mar 2014 21:46:50 GMT
There is some work to make this work on yarn at
https://github.com/aniket486/pig. (So, compile pig with ant
-Dhadoopversion=23)

You can look at https://github.com/aniket486/pig/blob/spork/pig-spark to
find out what sort of env variables you need (sorry, I haven't been able to
clean this up- in-progress). There are few known issues with this, I will
work on fixing them soon.

Known issues-
1. Limit does not work (spork-fix)
2. Foreach requires to turn off schema-tuple-backend (should be a pig-jira)
3. Algebraic udfs dont work (spork-fix in-progress)
4. Group by rework (to avoid OOMs)
5. UDF Classloader issue (requires SPARK-1053, then you can put
pig-withouthadoop.jar as SPARK_JARS in SparkContext along with udf jars)

~Aniket




On Thu, Mar 6, 2014 at 1:36 PM, Tom Graves <tgraves_cs@yahoo.com> wrote:

> I had asked a similar question on the dev mailing list a while back (Jan
> 22nd).
>
> See the archives:
> http://mail-archives.apache.org/mod_mbox/spark-dev/201401.mbox/browser ->
> look for spork.
>
> Basically Matei said:
>
> Yup, that was it, though I believe people at Twitter picked it up again recently. I'd
suggest
> asking Dmitriy if you know him. I've seen interest in this from several other groups,
and
> if there's enough of it, maybe we can start another open source repo to track it. The
work
> in that repo you pointed to was done over one week, and already had most of Pig's operators
> working. (I helped out with this prototype over Twitter's hack week.) That work also
calls
> the Scala API directly, because it was done before we had a Java API; it should be easier
> with the Java one.
>
>
> Tom
>
>
>
>   On Thursday, March 6, 2014 3:11 PM, Sameer Tilak <sstilak@live.com>
> wrote:
>   Hi everyone,
>
> We are using to Pig to build our data pipeline. I came across Spork -- Pig
> on Spark at: https://github.com/dvryaboy/pig and not sure if it is still
> active.
>
> Can someone please let me know the status of Spork or any other effort
> that will let us run Pig on Spark? We can significantly benefit by using
> Spark, but we would like to keep using the existing Pig scripts.
>
>
>


-- 
"...:::Aniket:::... Quetzalco@tl"

Mime
View raw message