Mailing-List: contact user-help@spark.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@spark.apache.org
Received-SPF: pass (nike.apache.org: domain of aniket486@gmail.com designates
 209.85.217.179 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <1394141819.24923.YahooMailNeo@web140105.mail.bf1.yahoo.com>
References: <BLU176-W2E79DB8F628DB3BF369DDD8880@phx.gbl>
	<1394141819.24923.YahooMailNeo@web140105.mail.bf1.yahoo.com>
Date: Thu, 6 Mar 2014 13:46:50 -0800
Message-ID: 
 <CAO0rWaZa_ocJz0NR=1YhRv_izoCbHdsm9n2f2eqav7gbnEYG3w@mail.gmail.com>
Subject: Re: Pig on Spark
From: Aniket Mokashi <aniket486@gmail.com>
To: user@spark.apache.org, Tom Graves <tgraves_cs@yahoo.com>
Content-Type: multipart/alternative; boundary=047d7beb97d8297bd104f3f713f0

--047d7beb97d8297bd104f3f713f0
Content-Type: text/plain; charset=ISO-8859-1

There is some work to make this work on yarn at
https://github.com/aniket486/pig. (So, compile pig with ant
-Dhadoopversion=23)

You can look at https://github.com/aniket486/pig/blob/spork/pig-spark to
find out what sort of env variables you need (sorry, I haven't been able to
clean this up- in-progress). There are few known issues with this, I will
work on fixing them soon.

Known issues-
1. Limit does not work (spork-fix)
2. Foreach requires to turn off schema-tuple-backend (should be a pig-jira)
3. Algebraic udfs dont work (spork-fix in-progress)
4. Group by rework (to avoid OOMs)
5. UDF Classloader issue (requires SPARK-1053, then you can put
pig-withouthadoop.jar as SPARK_JARS in SparkContext along with udf jars)

~Aniket


On Thu, Mar 6, 2014 at 1:36 PM, Tom Graves <tgraves_cs@yahoo.com> wrote:

> I had asked a similar question on the dev mailing list a while back (Jan
> 22nd).
>
> See the archives:
> http://mail-archives.apache.org/mod_mbox/spark-dev/201401.mbox/browser ->
> look for spork.
>
> Basically Matei said:
>
> Yup, that was it, though I believe people at Twitter picked it up again recently. I'd suggest
> asking Dmitriy if you know him. I've seen interest in this from several other groups, and
> if there's enough of it, maybe we can start another open source repo to track it. The work
> in that repo you pointed to was done over one week, and already had most of Pig's operators
> working. (I helped out with this prototype over Twitter's hack week.) That work also calls
> the Scala API directly, because it was done before we had a Java API; it should be easier
> with the Java one.
>
>
> Tom
>
>
>
>   On Thursday, March 6, 2014 3:11 PM, Sameer Tilak <sstilak@live.com>
> wrote:
>   Hi everyone,
>
> We are using to Pig to build our data pipeline. I came across Spork -- Pig
> on Spark at: https://github.com/dvryaboy/pig and not sure if it is still
> active.
>
> Can someone please let me know the status of Spork or any other effort
> that will let us run Pig on Spark? We can significantly benefit by using
> Spark, but we would like to keep using the existing Pig scripts.
>
>
>


-- 
"...:::Aniket:::... Quetzalco@tl"

--047d7beb97d8297bd104f3f713f0
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">There is some work to make this work on yarn at&nbsp;<a hr=
ef=3D"https://github.com/aniket486/pig">https://github.com/aniket486/pig</a=
>. (So, compile pig with ant -Dhadoopversion=3D23)<div><br></div><div>You c=
an look at&nbsp;<a href=3D"https://github.com/aniket486/pig/blob/spork/pig-=
spark">https://github.com/aniket486/pig/blob/spork/pig-spark</a> to find ou=
t what sort of env variables you need (sorry, I haven&#39;t been able to cl=
ean this up- in-progress). There are few known issues with this, I will wor=
k on fixing them soon.</div>
<div><br></div><div>Known issues-</div><div>1. Limit does not work (spork-f=
ix)</div><div>2. Foreach requires to turn off schema-tuple-backend (should =
be a pig-jira)</div><div>3. Algebraic udfs dont work (spork-fix in-progress=
)</div>
<div>4. Group by rework (to avoid OOMs)</div><div>5. UDF Classloader issue =
(requires SPARK-1053, then you can put pig-withouthadoop.jar as SPARK_JARS =
in SparkContext along with udf jars)</div><div><br></div><div>~Aniket</div>
<div><br></div><div><br></div></div><div class=3D"gmail_extra"><br><br><div=
 class=3D"gmail_quote">On Thu, Mar 6, 2014 at 1:36 PM, Tom Graves <span dir=
=3D"ltr">&lt;<a href=3D"mailto:tgraves_cs@yahoo.com" target=3D"_blank">tgra=
ves_cs@yahoo.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div><div style=3D"font-size:12pt;font-famil=
y:HelveticaNeue,Helvetica Neue,Helvetica,Arial,Lucida Grande,sans-serif"><d=
iv>
<span>I had asked a similar question on the dev mailing list a while back (=
Jan 22nd).&nbsp;</span></div><div style=3D"font-style:normal;font-size:16px=
;background-color:transparent;font-family:HelveticaNeue,&#39;Helvetica Neue=
&#39;,Helvetica,Arial,&#39;Lucida Grande&#39;,sans-serif">
<span><br></span></div><div style=3D"font-style:normal;font-size:16px;backg=
round-color:transparent;font-family:HelveticaNeue,&#39;Helvetica Neue&#39;,=
Helvetica,Arial,&#39;Lucida Grande&#39;,sans-serif"><span>See the archives:=
&nbsp;<a href=3D"http://mail-archives.apache.org/mod_mbox/spark-dev/201401.=
mbox/browser" target=3D"_blank">http://mail-archives.apache.org/mod_mbox/sp=
ark-dev/201401.mbox/browser</a>&nbsp;-&gt; look for spork.</span></div>
<div style=3D"font-style:normal;font-size:16px;background-color:transparent=
;font-family:HelveticaNeue,&#39;Helvetica Neue&#39;,Helvetica,Arial,&#39;Lu=
cida Grande&#39;,sans-serif"><span><br></span></div><div style=3D"font-styl=
e:normal;font-size:16px;background-color:transparent;font-family:HelveticaN=
eue,&#39;Helvetica Neue&#39;,Helvetica,Arial,&#39;Lucida Grande&#39;,sans-s=
erif">
<span>Basically Matei said:</span></div><div style=3D"font-style:normal;fon=
t-size:16px;background-color:transparent;font-family:HelveticaNeue,&#39;Hel=
vetica Neue&#39;,Helvetica,Arial,&#39;Lucida Grande&#39;,sans-serif"><span>=
<br>
</span></div><div style=3D"font-style:normal;font-size:16px;background-colo=
r:transparent;font-family:HelveticaNeue,&#39;Helvetica Neue&#39;,Helvetica,=
Arial,&#39;Lucida Grande&#39;,sans-serif"><span></span></div><pre>Yup, that=
 was it, though I believe people at Twitter picked it up again recently. I&=
rsquo;d suggest
asking Dmitriy if you know him. I&rsquo;ve seen interest in this from sever=
al other groups, and
if there&rsquo;s enough of it, maybe we can start another open source repo =
to track it. The work
in that repo you pointed to was done over one week, and already had most of=
 Pig&rsquo;s operators
working. (I helped out with this prototype over Twitter&rsquo;s hack week.)=
 That work also calls
the Scala API directly, because it was done before we had a Java API; it sh=
ould be easier
with the Java one.</pre><span class=3D"HOEnZb"><font color=3D"#888888"><pre=
><br></pre><pre>Tom</pre></font></span><div><div class=3D"h5"><div style=3D=
"display:block"> <br> <br> <div style=3D"font-family:HelveticaNeue,&#39;Hel=
vetica Neue&#39;,Helvetica,Arial,&#39;Lucida Grande&#39;,sans-serif;font-si=
ze:12pt">
 <div style=3D"font-family:HelveticaNeue,&#39;Helvetica Neue&#39;,Helvetica=
,Arial,&#39;Lucida Grande&#39;,sans-serif;font-size:12pt"> <div dir=3D"ltr"=
> <font face=3D"Arial"> On Thursday, March 6, 2014 3:11 PM, Sameer Tilak &l=
t;<a href=3D"mailto:sstilak@live.com" target=3D"_blank">sstilak@live.com</a=
>&gt; wrote:<br>
 </font> </div>  <div><div>


<div><div dir=3D"ltr"><div><span style=3D"font-family:Verdana,Geneva,Helvet=
ica,Arial,sans-serif;font-size:13px">Hi everyone,</span></div><div><span st=
yle=3D"font-family:Verdana,Geneva,Helvetica,Arial,sans-serif;font-size:13px=
"><br>
</span></div><span style=3D"font-family:Verdana,Geneva,Helvetica,Arial,sans=
-serif;font-size:13px">We are using to Pig to build our data pipeline. I ca=
me across Spork -- Pig on Spark at:&nbsp;</span><a rel=3D"nofollow" href=3D=
"https://github.com/dvryaboy/pig" style=3D"font-size:13px;color:rgb(85,26,1=
39);font-family:Verdana,Geneva,Helvetica,Arial,sans-serif" target=3D"_blank=
">https://github.com/dvryaboy/pig</a><span style=3D"font-family:Verdana,Gen=
eva,Helvetica,Arial,sans-serif;font-size:13px">&nbsp;and not sure if it is =
still active. &nbsp;&nbsp;</span><br style=3D"font-family:Verdana,Geneva,He=
lvetica,Arial,sans-serif;font-size:13px">
<br style=3D"font-family:Verdana,Geneva,Helvetica,Arial,sans-serif;font-siz=
e:13px"><font face=3D"Verdana, Geneva, Helvetica, Arial, sans-serif">Can so=
meone please let me know the status of Spork&nbsp;or any other effort that =
will let us run Pig on Spark? We can significantly&nbsp;benefit by using Sp=
ark, but we would like to keep using the existing Pig scripts.&nbsp;</font>=
 		 	   		  </div>
</div>
</div><br><br></div>  </div> </div>  </div> </div></div></div></div></block=
quote></div><br><br clear=3D"all"><div><br></div>-- <br>&quot;...:::Aniket:=
::... Quetzalco@tl&quot;
</div>

--047d7beb97d8297bd104f3f713f0--