Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1B6A41099A for ; Thu, 6 Mar 2014 21:47:18 +0000 (UTC) Received: (qmail 50766 invoked by uid 500); 6 Mar 2014 21:47:17 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 50698 invoked by uid 500); 6 Mar 2014 21:47:17 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@spark.apache.org Delivered-To: mailing list user@spark.apache.org Received: (qmail 50690 invoked by uid 99); 6 Mar 2014 21:47:16 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Mar 2014 21:47:16 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of aniket486@gmail.com designates 209.85.217.179 as permitted sender) Received: from [209.85.217.179] (HELO mail-lb0-f179.google.com) (209.85.217.179) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Mar 2014 21:47:11 +0000 Received: by mail-lb0-f179.google.com with SMTP id p9so2185210lbv.38 for ; Thu, 06 Mar 2014 13:46:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=HTQNkF8frLntPRmyQNgpnbd+KDrl70CgAfI/7oU01TU=; b=bOZkz1uLRp11vFMcb9ACnoHmjKwsrOcQgWWNyU4V3oaPt0w4bcEI5NdKcujxd5VH+g +iLMO+Wpc15gm501Htn8C1YKhvutfPfuMlILFLWa/cf//rXzQAV/icuA3C/WRO12qk74 FzuLlfjUKGQWEZYPt7y6PZSmlJ1EwiUadrvx7eN7lQLdOwbm1T58Si5O40Ui0zrRNYir MFHIOzphukXbZxjzRr+OY5P20ssj4QI72tcOpLQGY3ek0NpP5D6cP6UuLxhlP7/BKL9t f9EbBtIIVj+a0jUUnZu0leYlc2jv82ewCxlY1emzuBtg/j5VDNZTA4ezd74ksIolmI4m +aHA== MIME-Version: 1.0 X-Received: by 10.112.151.146 with SMTP id uq18mr2953106lbb.38.1394142410798; Thu, 06 Mar 2014 13:46:50 -0800 (PST) Received: by 10.112.97.72 with HTTP; Thu, 6 Mar 2014 13:46:50 -0800 (PST) In-Reply-To: <1394141819.24923.YahooMailNeo@web140105.mail.bf1.yahoo.com> References: <1394141819.24923.YahooMailNeo@web140105.mail.bf1.yahoo.com> Date: Thu, 6 Mar 2014 13:46:50 -0800 Message-ID: Subject: Re: Pig on Spark From: Aniket Mokashi To: user@spark.apache.org, Tom Graves Content-Type: multipart/alternative; boundary=047d7beb97d8297bd104f3f713f0 X-Virus-Checked: Checked by ClamAV on apache.org --047d7beb97d8297bd104f3f713f0 Content-Type: text/plain; charset=ISO-8859-1 There is some work to make this work on yarn at https://github.com/aniket486/pig. (So, compile pig with ant -Dhadoopversion=23) You can look at https://github.com/aniket486/pig/blob/spork/pig-spark to find out what sort of env variables you need (sorry, I haven't been able to clean this up- in-progress). There are few known issues with this, I will work on fixing them soon. Known issues- 1. Limit does not work (spork-fix) 2. Foreach requires to turn off schema-tuple-backend (should be a pig-jira) 3. Algebraic udfs dont work (spork-fix in-progress) 4. Group by rework (to avoid OOMs) 5. UDF Classloader issue (requires SPARK-1053, then you can put pig-withouthadoop.jar as SPARK_JARS in SparkContext along with udf jars) ~Aniket On Thu, Mar 6, 2014 at 1:36 PM, Tom Graves wrote: > I had asked a similar question on the dev mailing list a while back (Jan > 22nd). > > See the archives: > http://mail-archives.apache.org/mod_mbox/spark-dev/201401.mbox/browser -> > look for spork. > > Basically Matei said: > > Yup, that was it, though I believe people at Twitter picked it up again recently. I'd suggest > asking Dmitriy if you know him. I've seen interest in this from several other groups, and > if there's enough of it, maybe we can start another open source repo to track it. The work > in that repo you pointed to was done over one week, and already had most of Pig's operators > working. (I helped out with this prototype over Twitter's hack week.) That work also calls > the Scala API directly, because it was done before we had a Java API; it should be easier > with the Java one. > > > Tom > > > > On Thursday, March 6, 2014 3:11 PM, Sameer Tilak > wrote: > Hi everyone, > > We are using to Pig to build our data pipeline. I came across Spork -- Pig > on Spark at: https://github.com/dvryaboy/pig and not sure if it is still > active. > > Can someone please let me know the status of Spork or any other effort > that will let us run Pig on Spark? We can significantly benefit by using > Spark, but we would like to keep using the existing Pig scripts. > > > -- "...:::Aniket:::... Quetzalco@tl" --047d7beb97d8297bd104f3f713f0 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
There is some work to make this work on yarn at https://github.com/aniket486/pig. (So, compile pig with ant -Dhadoopversion=3D23)

You c= an look at https://github.com/aniket486/pig/blob/spork/pig-spark to find ou= t what sort of env variables you need (sorry, I haven't been able to cl= ean this up- in-progress). There are few known issues with this, I will wor= k on fixing them soon.

Known issues-
1. Limit does not work (spork-f= ix)
2. Foreach requires to turn off schema-tuple-backend (should = be a pig-jira)
3. Algebraic udfs dont work (spork-fix in-progress= )
4. Group by rework (to avoid OOMs)
5. UDF Classloader issue = (requires SPARK-1053, then you can put pig-withouthadoop.jar as SPARK_JARS = in SparkContext along with udf jars)

~Aniket




On Thu, Mar 6, 2014 at 1:36 PM, Tom Graves <tgra= ves_cs@yahoo.com> wrote:
I had asked a similar question on the dev mailing list a while back (= Jan 22nd). 


Basically Matei said:
=
Yup, that=
 was it, though I believe people at Twitter picked it up again recently. I&=
rsquo;d suggest
asking Dmitriy if you know him. I’ve seen interest in this from sever=
al other groups, and
if there’s enough of it, maybe we can start another open source repo =
to track it. The work
in that repo you pointed to was done over one week, and already had most of=
 Pig’s operators
working. (I helped out with this prototype over Twitter’s hack week.)=
 That work also calls
the Scala API directly, because it was done before we had a Java API; it sh=
ould be easier
with the Java one.

Tom


On Thursday, March 6, 2014 3:11 PM, Sameer Tilak &l= t;sstilak@live.com> wrote:
Hi everyone,

We are using to Pig to build our data pipeline. I ca= me across Spork -- Pig on Spark at: 
https://github.com/dvryaboy/pig and not sure if it is = still active.   

Can so= meone please let me know the status of Spork or any other effort that = will let us run Pig on Spark? We can significantly benefit by using Sp= ark, but we would like to keep using the existing Pig scripts. =





--
"...:::Aniket:= ::... Quetzalco@tl" --047d7beb97d8297bd104f3f713f0--