Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AD49318523 for ; Wed, 7 Oct 2015 17:26:03 +0000 (UTC) Received: (qmail 39585 invoked by uid 500); 7 Oct 2015 17:25:37 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 39504 invoked by uid 500); 7 Oct 2015 17:25:37 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 39494 invoked by uid 99); 7 Oct 2015 17:25:37 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Oct 2015 17:25:37 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id DFC14C059F for ; Wed, 7 Oct 2015 17:25:36 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id hNMbgGqOeow1 for ; Wed, 7 Oct 2015 17:25:25 +0000 (UTC) Received: from mail-qg0-f49.google.com (mail-qg0-f49.google.com [209.85.192.49]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id D925E429AA for ; Wed, 7 Oct 2015 17:25:24 +0000 (UTC) Received: by qgt47 with SMTP id 47so20931896qgt.2 for ; Wed, 07 Oct 2015 10:25:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:mime-version:message-id:in-reply-to:references:from:to:subject :content-type; bh=bkiYMACs48AkDd6SLtd2f/PDvSxVft6rPDgLo65wgO4=; b=Qzvaja0fo4PYez4q16fkBCTnzQOVS+FBxQL2lkTa3gjvFrbYiYnDPo7LY+L+3TV4Xn GFwpPyljqrynWbajawq7+RVsSzTtGDaVeej29CqF31WYMwOsgOtQeNY8yRsf0Li8DSjN DAFGzHQTTq5s6KUpOy0DLLbzswcW7wzXNv2eKdHEWXp8RRGxdJUhEk/15Q2x+YohQrEt PcvLFJnHEK7WX07Jg8kGMKi0gQLKbTOVNieanPWbNr3ksVGcjJVS4A4D+rELCsxjRmTc u843CpQPxewlob8PdbtLwDz1zbajBb+L89irq3OW2HrtE5/bIyl9db0RSg9JScJKeJzw fvIQ== X-Received: by 10.140.151.76 with SMTP id 73mr3081116qhx.61.1444238715694; Wed, 07 Oct 2015 10:25:15 -0700 (PDT) Received: from hedwig-24.prd.orcali.com (ec2-54-85-253-245.compute-1.amazonaws.com. [54.85.253.245]) by smtp.gmail.com with ESMTPSA id 103sm16676280qgx.35.2015.10.07.10.25.14 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 07 Oct 2015 10:25:14 -0700 (PDT) Date: Wed, 07 Oct 2015 10:25:14 -0700 (PDT) X-Google-Original-Date: Wed, 07 Oct 2015 17:25:13 GMT MIME-Version: 1.0 X-Mailer: Nodemailer (0.5.0; +http://www.nodemailer.com/) Message-Id: <1444238713877.ee0494d8@Nodemailer> In-Reply-To: References: X-Orchestra-Oid: 985DA172-AE68-4CC0-9534-E01D6B8EC72F X-Orchestra-Sig: 68a2146080ea37f36057832de926f73a8e2d3ac1 X-Orchestra-Thrid: TEDF7BF56-A3FD-4BD3-A0FA-706CB3088F7E_1508861263467229668 X-Orchestra-Thrid-Sig: 886dfbe8fefdd7e7e17a793742ff217234a082ce X-Orchestra-Account: e5a17688e7d883bc9bf7952d5ce6704dd8de202d From: "Nick Pentreath" To: "user" Subject: Re: Spark job workflow engine recommendations Content-Type: multipart/alternative; boundary="----Nodemailer-0.5.0-?=_1-1444238714469" ------Nodemailer-0.5.0-?=_1-1444238714469 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable We're also using Azkaban for scheduling, and we simply use spark-submit via= she'll scripts. It works fine. The auto retry feature with a large number of retries (like 100 or 1000 = perhaps) should take care of long-running jobs with restarts on failure. We= haven't used it for streaming yet though we have long-running jobs and = Azkaban won't kill them unless an SLA is in place. =E2=80=94 Sent from Mailbox On Wed, Oct 7, 2015 at 7:18 PM, Vikram Kone wrote: > Hien, > I saw this pull request and from what I understand this is geared = towards > running spark jobs over hadoop. We are using spark over cassandra and = not > sure if this new jobtype supports that. I haven't seen any documentation = in > regards to how to use this spark job plugin, so that I can test it out = on > our cluster. > We are currently submitting our spark jobs using command job type using = the > following command =22dse spark-submit --class com.org.classname ./test.= jar=22 > etc. What would be the advantage of using the native spark job type over > command job type=3F > I didn't understand from your reply if azkaban already supports long > running jobs like spark streaming..does it=3F streaming jobs generally = need > to be running indefinitely or forever and needs to be restarted if for = some > reason they fail (lack of resources may be..). I can probably use the = auto > retry feature for this, but not sure > I'm looking forward to the multiple executor support which will greatly > enhance the scalability issue. > On Wed, Oct 7, 2015 at 9:56 AM, Hien Luu wrote: >> The spark job type was added recently - see this pull request >> https://github.com/azkaban/azkaban-plugins/pull/195. You can leverage >> the SLA feature to kill a job if it ran longer than expected. >> >> BTW, we just solved the scalability issue by supporting multiple >> executors. Within a week or two, the code for that should be merged in = the >> main trunk. >> >> Hien >> >> On Tue, Oct 6, 2015 at 9:40 PM, Vikram Kone = wrote: >> >>> Does Azkaban support scheduling long running jobs like spark steaming >>> jobs=3F Will Azkaban kill a job if it's running for a long time. >>> >>> >>> On Friday, August 7, 2015, Vikram Kone wrote: >>> >>>> Hien, >>>> Is Azkaban being phased out at linkedin as rumored=3F If so, what's >>>> linkedin going to use for workflow scheduling=3F Is there something = else >>>> that's going to replace Azkaban=3F >>>> >>>> On Fri, Aug 7, 2015 at 11:25 AM, Ted Yu wrote: >>>> >>>>> In my opinion, choosing some particular project among its peers = should >>>>> leave enough room for future growth (which may come faster than you >>>>> initially think). >>>>> >>>>> Cheers >>>>> >>>>> On Fri, Aug 7, 2015 at 11:23 AM, Hien Luu wrote: >>>>> >>>>>> Scalability is a known issue due the the current architecture. >>>>>> However this will be applicable if you run more 20K jobs per day. >>>>>> >>>>>> On Fri, Aug 7, 2015 at 10:30 AM, Ted Yu = wrote: >>>>>> >>>>>>> From what I heard (an ex-coworker who is Oozie committer), Azkaban >>>>>>> is being phased out at LinkedIn because of scalability issues = (though >>>>>>> UI-wise, Azkaban seems better). >>>>>>> >>>>>>> Vikram: >>>>>>> I suggest you do more research in related projects (maybe using = their >>>>>>> mailing lists). >>>>>>> >>>>>>> Disclaimer: I don't work for LinkedIn. >>>>>>> >>>>>>> On Fri, Aug 7, 2015 at 10:12 AM, Nick Pentreath < >>>>>>> nick.pentreath@gmail.com> wrote: >>>>>>> >>>>>>>> Hi Vikram, >>>>>>>> >>>>>>>> We use Azkaban (2.5.0) in our production workflow scheduling. We >>>>>>>> just use local mode deployment and it is fairly easy to set up. It= is >>>>>>>> pretty easy to use and has a nice scheduling and logging interface= , as well >>>>>>>> as SLAs (like kill job and notify if it doesn't complete in 3 = hours or >>>>>>>> whatever). >>>>>>>> >>>>>>>> However Spark support is not present directly - we run everything >>>>>>>> with shell scripts and spark-submit. There is a plugin interface = where one >>>>>>>> could create a Spark plugin, but I found it very cumbersome when I= did >>>>>>>> investigate and didn't have the time to work through it to develop= that. >>>>>>>> >>>>>>>> It has some quirks and while there is actually a REST API for = adding >>>>>>>> jos and dynamically scheduling jobs, it is not documented anywhere= so you >>>>>>>> kinda have to figure it out for yourself. But in terms of ease of = use I >>>>>>>> found it way better than Oozie. I haven't tried Chronos, and it = seemed >>>>>>>> quite involved to set up. Haven't tried Luigi either. >>>>>>>> >>>>>>>> Spark job server is good but as you say lacks some stuff like >>>>>>>> scheduling and DAG type workflows (independent of spark-defined = job flows). >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Aug 7, 2015 at 7:00 PM, J=C3=B6rn Franke >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Check also falcon in combination with oozie >>>>>>>>> >>>>>>>>> Le ven. 7 ao=C3=BBt 2015 =C3=A0 17:51, Hien Luu >>>>>>>>> a =C3=A9crit : >>>>>>>>> >>>>>>>>>> Looks like Oozie can satisfy most of your requirements. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Aug 7, 2015 at 8:43 AM, Vikram Kone >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> I'm looking for open source workflow tools/engines that allow = us >>>>>>>>>>> to schedule spark jobs on a datastax cassandra cluster. Since = there are >>>>>>>>>>> tonnes of alternatives out there like Ozzie, Azkaban, Luigi , = Chronos etc, >>>>>>>>>>> I wanted to check with people here to see what they are using = today. >>>>>>>>>>> >>>>>>>>>>> Some of the requirements of the workflow engine that I'm = looking >>>>>>>>>>> for are >>>>>>>>>>> >>>>>>>>>>> 1. First class support for submitting Spark jobs on Cassandra. >>>>>>>>>>> Not some wrapper Java code to submit tasks. >>>>>>>>>>> 2. Active open source community support and well tested at >>>>>>>>>>> production scale. >>>>>>>>>>> 3. Should be dead easy to write job dependencices using XML or >>>>>>>>>>> web interface . Ex; job A depends on Job B and Job C, so run = Job A after B >>>>>>>>>>> and C are finished. Don't need to write full blown java = applications to >>>>>>>>>>> specify job parameters and dependencies. Should be very simple = to use. >>>>>>>>>>> 4. Time based recurrent scheduling. Run the spark jobs at a >>>>>>>>>>> given time every hour or day or week or month. >>>>>>>>>>> 5. Job monitoring, alerting on failures and email = notifications >>>>>>>>>>> on daily basis. >>>>>>>>>>> >>>>>>>>>>> I have looked at Ooyala's spark job server which seems to be >>>>>>>>>>> hated towards making spark jobs run faster by sharing contexts = between the >>>>>>>>>>> jobs but isn't a full blown workflow engine per se. A = combination of spark >>>>>>>>>>> job server and workflow engine would be ideal >>>>>>>>>>> >>>>>>>>>>> Thanks for the inputs >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> ------Nodemailer-0.5.0-?=_1-1444238714469 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
We're also using Azkaban for scheduling, and we simply use = spark-submit via she'll scripts. It works fine.

The auto retry feature with a large number of retries (like 100 or = 1000 perhaps) should take care of long-running jobs with restarts on = failure. We haven't used it for streaming yet though we have long-running = jobs and Azkaban won't kill them unless an SLA is in place.



=E2=80=94
Sent from Mailbox


On Wed, Oct 7, 2015 at 7:18 PM,= Vikram Kone <vikramkone@gmail.com> = wrote:

Hien,
I saw this pull request and from what I understand= this is geared towards running spark jobs over hadoop. We are using spark = over cassandra and not sure if this new jobtype supports that. I haven't = seen any documentation in regards to how to use this spark job plugin, so = that I can test it out on our cluster.
We are currently submitting our spark jobs using = command job type using the following command =C2=A0=22dse spark-submit = --class com.org.classname ./test.jar=22 etc. What would be the advantage of= using the native spark job type over command job type=3F

I didn't understand from your reply if azkaban = already supports long running jobs like spark streaming..does it=3F = streaming jobs generally need to be running indefinitely or forever and = needs to be restarted if for some reason they fail (lack of resources may = be..). I can probably use the auto retry feature for this, but not = sure

I'm looking forward to the multiple executor = support which will greatly enhance the scalability issue.

On Wed, Oct 7, 2015 at 9:56 AM, Hien = Luu <hluu@linkedin.com> wrote:
The spark job type was added recently - see this pull request https://gi= thub.com/azkaban/azkaban-plugins/pull/195.=C2=A0 You can leverage the = SLA feature to kill a job if it ran longer than expected.

BTW,= we just solved the scalability issue by supporting multiple executors.= =C2=A0 Within a week or two, the code for that should be merged in the main= trunk.

Hien

On Tue, Oct 6, 2015 at 9:40 PM, Vikram= Kone <vikramkone@gmail.com> wrote:
Does Azkaban support scheduling long running= jobs like spark steaming jobs=3F Will Azkaban kill a=C2=A0job= if it's running for a long time.


On Friday, August 7, 2015, Vikram Kone <vikramkone@gmail.com> = wrote:
Hien,
Is Azkaban being phased out at linkedin as = rumored=3F If so, what's linkedin going to use for workflow scheduling=3F = Is there something else that's going to replace Azkaban=3F

On Fri, Aug 7, 2015 at 11:25 AM, Ted = Yu <yuzhihong@gmail.com> = wrote:
In my opinion, choosing some particular project among = its peers should leave enough room for future growth (which may come faster= than you initially think).

Cheers

On Fri, Aug 7, 2015 at 11:23 AM, Hien = Luu <hluu@linkedin.com> = wrote:
Scalability is a known issue due the the current = architecture.=C2=A0 However this will be applicable if you run more 20K = jobs per day.

On Fri, Aug 7, 2015 at 10:30 AM, Ted Yu <yuzhihong@gmail.com> = wrote:
From what I heard (an ex-coworker who is Oozie = committer),=C2=A0Azkaban = is being phased out at LinkedIn because of scalability issues (though = UI-wise, Azkaban seems better).

Vikram:=
I = suggest you do more research in related projects (maybe using their mailing= lists).

Disclaimer: I don't work = for=C2=A0LinkedIn.=

On Fri, Aug 7, 2015 at 10:12 AM, Nick Pentreath <nick.pentreath@gmail.com> = wrote:
Hi Vikram,

We use Azkaban (2.5.0) in our production workflow scheduling. We just = use local mode deployment and it is fairly easy to set up. It is pretty = easy to use and has a nice scheduling and logging interface, as well as = SLAs (like kill job and notify if it doesn't complete in 3 hours or = whatever).=C2=A0

However Spark support is not present directly - we run everything with= shell scripts and spark-submit. There is a plugin interface where one = could create a Spark plugin, but I found it very cumbersome when I did = investigate and didn't have the time to work through it to develop that.=

It has some quirks and while there is actually a REST API for adding = jos and dynamically scheduling jobs, it is not documented anywhere so you = kinda have to figure it out for yourself. But in terms of ease of use I = found it way better than Oozie. I haven't tried Chronos, and it seemed = quite involved to set up. Haven't tried Luigi either.

Spark job server is good but as you say lacks some stuff like = scheduling and DAG type workflows (independent of spark-defined job flows).=


On Fri, Aug 7, 2015 at 7:00 PM, J=C3=B6rn Franke <jornfranke@gmail.com> = wrote:

Check also falcon in combination with = oozie


Le=C2=A0ven. 7 ao=C3=BBt 2015 =C3=A0 17:51,= =C2=A0Hien Luu <hluu@linkedin.com.invalid> a =C3=A9crit=C2=A0:
Looks like Oozie can satisfy most of your = requirements.



On Fri, Aug 7, 2015 at 8:43 AM, Vikram Kone <vikramkone@gmail.com> = wrote:
Hi,
I'm looking for open = source workflow tools/engines that allow us to schedule spark jobs on a = datastax=C2=A0cassandra cluster. Since there are tonnes of alternatives out= there like Ozzie, Azkaban, Luigi , Chronos etc, I wanted to check with = people here to see what they are using today.

=
Some of the requirements= of the workflow engine that I'm looking for are

=
1. First class support = for submitting=C2=A0Spark jobs on=C2=A0Cassandra. Not some wrapper Java = code to submit tasks.
2. Active=C2=A0open = source community support and well tested at production scale.=
3. Should be = dead=C2=A0easy to write job dependencices using XML or web interface . Ex; = job A depends on Job B=C2=A0and Job C, so run Job A after B and C are = finished. Don't need to write full blown java applications to specify job = parameters and dependencies. Should be very simple to use.=
4. Time based = =C2=A0recurrent scheduling. Run the spark jobs at a given time every hour = or day or week or month.
5. Job monitoring, = alerting on failures=C2=A0and email notifications on daily basis.=

=
I have looked at = Ooyala's spark job server which seems to be hated towards making spark jobs= run faster by sharing contexts between the jobs but isn't a full blown = workflow engine per se. A combination of spark job server and workflow = engine would be ideal=C2=A0

=
Thanks for the inputs









------Nodemailer-0.5.0-?=_1-1444238714469--