kylin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amarnath Arsikere <a...@infoworks.io>
Subject Re: Proposal for new Job Engine
Date Tue, 13 Jan 2015 09:59:35 GMT
Why not use Apache oozie instead. Even though you may not need complex
workflows; you can gain some operational benefits and flexibility like say;
Increasing memory for a hadoop job or monitoring the job on a web console.

Regards,
Amar



On Tue, Jan 13, 2015 at 3:04 PM, Zhou, Qianhao <qianzhou@ebay.com> wrote:

> What we want is that:
>
> 1. A lightweight job engine, easy to start, stop and check jobs
>    Because most of the heavyweight job is map-reduce which is already
>    running on the cluster, so we don’t need the job engine to run on a
> cluster.
>
> 2. Kylin already has a job engine based on Quartz, however, only a very
> small
>    part of functionalities are used, so we can easily replace it with
>    standard java api.
>    Thus there will be no extra dependency which means easier to deploy.
>
> Currently a very simple job engine implementation will meet the kylin’s
> needs.
> So I think at this timing just keep it simple would be the better choice.
>
>
> Best Regard
> Zhou QianHao
>
>
>
>
>
> On 1/13/15, 4:43 PM, "Ted Dunning" <ted.dunning@gmail.com> wrote:
>
> >So why are the following systems unsuitable?
> >
> >- mesos + (aurora or chronos)
> >- spark
> >- yarn
> >- drill's drillbits
> >
> >These options do different things.  I know that.  I am not entirely clear
> >on what you want, however, so I present these different options so that
> >you
> >can tell me better what you want.
> >
> >Mesos provides very flexible job scheduling.  With Aurora, it has support
> >for handling long-running and periodic jobs.  With Chronos, it has the
> >equivalent of a cluster level cron.
> >
> >Spark provides the ability for a program to spawn lots of parallel
> >execution.  This is different than what most people mean by job
> >scheduling,
> >but in conjunction with a queuing system combined with spark streaming,
> >you
> >can get remarkably close to a job scheduler.
> >
> >Yarn can run jobs, but has no capabilities to schedule recurring jobs.  It
> >can adjudicate the allocation of cluster resources.  This is different
> >from
> >what either spark or mesos does.
> >
> >Drill's drillbits do scheduling of queries across a parallel execution
> >environment.  It currently has no user impersonation, but does do an
> >interesting job of scheduling parts of parallel queries.
> >
> >Each of these could be considered like a job scheduler.  Only a very few
> >are likely to be what you are talking about.
> >
> >Which is it?
> >
> >
> >
> >
> >On Tue, Jan 13, 2015 at 1:53 AM, Zhou, Qianhao <qianzhou@ebay.com> wrote:
> >
> >> The goal of this job engine is that:
> >> Provide unified interface for all job execution, query.
> >> Here job can be for example Kylin query, Building Cube, GC etc.
> >> As the old job engine is hard to support jobs other than Building Cube,
> >> I think it is mandatory before we introduce new realization of data
> >>model,
> >> such as inverted-index.
> >>
> >> Best Regard
> >> Zhou QianHao
> >>
> >>
> >>
> >>
> >>
> >> On 1/13/15, 3:42 PM, "Ted Dunning" <ted.dunning@gmail.com> wrote:
> >>
> >> >What is the goal of this job engine?
> >> >
> >> >To just run Kylin queries?
> >> >
> >> >
> >> >
> >> >On Tue, Jan 13, 2015 at 12:31 AM, Henry Saputra
> >><henry.saputra@gmail.com>
> >> >wrote:
> >> >
> >> >> I believe we do not care about Spark client APIs for the distributed
> >> >> execution engine, so I would recommend to take a look also at Apache
> >> >> Flink [1].
> >> >>
> >> >> Similar to Spark, it has execution engine that could run standalone
> >>or
> >> >> on YARN as DAG.
> >> >> But since we want to focus mostly on backend, it has some special
> >> >> features like built-in iteration operator, heap memory management,
> >>and
> >> >> also cost optimizer for execution plan.
> >> >>
> >> >> - Henry
> >> >>
> >> >> [1] http://flink.apache.org/
> >> >>
> >> >> On Mon, Jan 12, 2015 at 10:17 PM, Li Yang <liyang@apache.org>
wrote:
> >> >> > Agree. We shall proceed to refactor the job engine. It needs to
be
> >> >>more
> >> >> > extensible and friendly to add new jobs and steps. This is a
> >> >>prerequisite
> >> >> > for Kylin to explore other opportunities for faster cube build,
> >>like
> >> >> Spark
> >> >> > and
> >> >> >
> >> >> > Please update with finer designs.
> >> >> >
> >> >> > On Fri, Jan 9, 2015 at 10:07 AM, 周千昊 <z.qianhao@gmail.com>
wrote:
> >> >> >
> >> >> >> Currently Kylin has its own Job Engine to schedule cubing
process.
> >> >> However
> >> >> >> there are some demerits
> >> >> >> 1. It is too tightly couple with cubing process, thus cannot
> >>support
> >> >> other
> >> >> >> kind of jobs easily
> >> >> >> 2. It is hard to expand or to integrate with other techniques
(for
> >> >> example
> >> >> >> Spark)
> >> >> >> Thus I have proposed a refactor for the current job engine.
> >> >> >> Below is the wiki page in Github
> >> >> >>
> >> https://github.com/KylinOLAP/Kylin/wiki/%5BProposal%5D-New-Job-Engine
> >> >> >>
> >> >>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message