spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Pentreath" <nick.pentre...@gmail.com>
Subject Re: Spark job workflow engine recommendations
Date Wed, 07 Oct 2015 17:25:14 GMT
We're also using Azkaban for scheduling, and we simply use spark-submit via she'll scripts.
It works fine.




The auto retry feature with a large number of retries (like 100 or 1000 perhaps) should take
care of long-running jobs with restarts on failure. We haven't used it for streaming yet though
we have long-running jobs and Azkaban won't kill them unless an SLA is in place.









—
Sent from Mailbox

On Wed, Oct 7, 2015 at 7:18 PM, Vikram Kone <vikramkone@gmail.com> wrote:

> Hien,
> I saw this pull request and from what I understand this is geared towards
> running spark jobs over hadoop. We are using spark over cassandra and not
> sure if this new jobtype supports that. I haven't seen any documentation in
> regards to how to use this spark job plugin, so that I can test it out on
> our cluster.
> We are currently submitting our spark jobs using command job type using the
> following command  "dse spark-submit --class com.org.classname ./test.jar"
> etc. What would be the advantage of using the native spark job type over
> command job type?
> I didn't understand from your reply if azkaban already supports long
> running jobs like spark streaming..does it? streaming jobs generally need
> to be running indefinitely or forever and needs to be restarted if for some
> reason they fail (lack of resources may be..). I can probably use the auto
> retry feature for this, but not sure
> I'm looking forward to the multiple executor support which will greatly
> enhance the scalability issue.
> On Wed, Oct 7, 2015 at 9:56 AM, Hien Luu <hluu@linkedin.com> wrote:
>> The spark job type was added recently - see this pull request
>> https://github.com/azkaban/azkaban-plugins/pull/195.  You can leverage
>> the SLA feature to kill a job if it ran longer than expected.
>>
>> BTW, we just solved the scalability issue by supporting multiple
>> executors.  Within a week or two, the code for that should be merged in the
>> main trunk.
>>
>> Hien
>>
>> On Tue, Oct 6, 2015 at 9:40 PM, Vikram Kone <vikramkone@gmail.com> wrote:
>>
>>> Does Azkaban support scheduling long running jobs like spark steaming
>>> jobs? Will Azkaban kill a job if it's running for a long time.
>>>
>>>
>>> On Friday, August 7, 2015, Vikram Kone <vikramkone@gmail.com> wrote:
>>>
>>>> Hien,
>>>> Is Azkaban being phased out at linkedin as rumored? If so, what's
>>>> linkedin going to use for workflow scheduling? Is there something else
>>>> that's going to replace Azkaban?
>>>>
>>>> On Fri, Aug 7, 2015 at 11:25 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>>
>>>>> In my opinion, choosing some particular project among its peers should
>>>>> leave enough room for future growth (which may come faster than you
>>>>> initially think).
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Fri, Aug 7, 2015 at 11:23 AM, Hien Luu <hluu@linkedin.com> wrote:
>>>>>
>>>>>> Scalability is a known issue due the the current architecture.
>>>>>> However this will be applicable if you run more 20K jobs per day.
>>>>>>
>>>>>> On Fri, Aug 7, 2015 at 10:30 AM, Ted Yu <yuzhihong@gmail.com>
wrote:
>>>>>>
>>>>>>> From what I heard (an ex-coworker who is Oozie committer), Azkaban
>>>>>>> is being phased out at LinkedIn because of scalability issues
(though
>>>>>>> UI-wise, Azkaban seems better).
>>>>>>>
>>>>>>> Vikram:
>>>>>>> I suggest you do more research in related projects (maybe using
their
>>>>>>> mailing lists).
>>>>>>>
>>>>>>> Disclaimer: I don't work for LinkedIn.
>>>>>>>
>>>>>>> On Fri, Aug 7, 2015 at 10:12 AM, Nick Pentreath <
>>>>>>> nick.pentreath@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Vikram,
>>>>>>>>
>>>>>>>> We use Azkaban (2.5.0) in our production workflow scheduling.
We
>>>>>>>> just use local mode deployment and it is fairly easy to set
up. It is
>>>>>>>> pretty easy to use and has a nice scheduling and logging
interface, as well
>>>>>>>> as SLAs (like kill job and notify if it doesn't complete
in 3 hours or
>>>>>>>> whatever).
>>>>>>>>
>>>>>>>> However Spark support is not present directly - we run everything
>>>>>>>> with shell scripts and spark-submit. There is a plugin interface
where one
>>>>>>>> could create a Spark plugin, but I found it very cumbersome
when I did
>>>>>>>> investigate and didn't have the time to work through it to
develop that.
>>>>>>>>
>>>>>>>> It has some quirks and while there is actually a REST API
for adding
>>>>>>>> jos and dynamically scheduling jobs, it is not documented
anywhere so you
>>>>>>>> kinda have to figure it out for yourself. But in terms of
ease of use I
>>>>>>>> found it way better than Oozie. I haven't tried Chronos,
and it seemed
>>>>>>>> quite involved to set up. Haven't tried Luigi either.
>>>>>>>>
>>>>>>>> Spark job server is good but as you say lacks some stuff
like
>>>>>>>> scheduling and DAG type workflows (independent of spark-defined
job flows).
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Aug 7, 2015 at 7:00 PM, Jörn Franke <jornfranke@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Check also falcon in combination with oozie
>>>>>>>>>
>>>>>>>>> Le ven. 7 août 2015 à 17:51, Hien Luu <hluu@linkedin.com.invalid>
>>>>>>>>> a écrit :
>>>>>>>>>
>>>>>>>>>> Looks like Oozie can satisfy most of your requirements.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Aug 7, 2015 at 8:43 AM, Vikram Kone <vikramkone@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>> I'm looking for open source workflow tools/engines
that allow us
>>>>>>>>>>> to schedule spark jobs on a datastax cassandra
cluster. Since there are
>>>>>>>>>>> tonnes of alternatives out there like Ozzie,
Azkaban, Luigi , Chronos etc,
>>>>>>>>>>> I wanted to check with people here to see what
they are using today.
>>>>>>>>>>>
>>>>>>>>>>> Some of the requirements of the workflow engine
that I'm looking
>>>>>>>>>>> for are
>>>>>>>>>>>
>>>>>>>>>>> 1. First class support for submitting Spark jobs
on Cassandra.
>>>>>>>>>>> Not some wrapper Java code to submit tasks.
>>>>>>>>>>> 2. Active open source community support and well
tested at
>>>>>>>>>>> production scale.
>>>>>>>>>>> 3. Should be dead easy to write job dependencices
using XML or
>>>>>>>>>>> web interface . Ex; job A depends on Job B and
Job C, so run Job A after B
>>>>>>>>>>> and C are finished. Don't need to write full
blown java applications to
>>>>>>>>>>> specify job parameters and dependencies. Should
be very simple to use.
>>>>>>>>>>> 4. Time based  recurrent scheduling. Run the
spark jobs at a
>>>>>>>>>>> given time every hour or day or week or month.
>>>>>>>>>>> 5. Job monitoring, alerting on failures and email
notifications
>>>>>>>>>>> on daily basis.
>>>>>>>>>>>
>>>>>>>>>>> I have looked at Ooyala's spark job server which
seems to be
>>>>>>>>>>> hated towards making spark jobs run faster by
sharing contexts between the
>>>>>>>>>>> jobs but isn't a full blown workflow engine per
se. A combination of spark
>>>>>>>>>>> job server and workflow engine would be ideal
>>>>>>>>>>>
>>>>>>>>>>> Thanks for the inputs
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
Mime
View raw message