kylin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Proposal for new Job Engine
Date Tue, 13 Jan 2015 08:43:41 GMT
So why are the following systems unsuitable?

- mesos + (aurora or chronos)
- spark
- yarn
- drill's drillbits

These options do different things.  I know that.  I am not entirely clear
on what you want, however, so I present these different options so that you
can tell me better what you want.

Mesos provides very flexible job scheduling.  With Aurora, it has support
for handling long-running and periodic jobs.  With Chronos, it has the
equivalent of a cluster level cron.

Spark provides the ability for a program to spawn lots of parallel
execution.  This is different than what most people mean by job scheduling,
but in conjunction with a queuing system combined with spark streaming, you
can get remarkably close to a job scheduler.

Yarn can run jobs, but has no capabilities to schedule recurring jobs.  It
can adjudicate the allocation of cluster resources.  This is different from
what either spark or mesos does.

Drill's drillbits do scheduling of queries across a parallel execution
environment.  It currently has no user impersonation, but does do an
interesting job of scheduling parts of parallel queries.

Each of these could be considered like a job scheduler.  Only a very few
are likely to be what you are talking about.

Which is it?




On Tue, Jan 13, 2015 at 1:53 AM, Zhou, Qianhao <qianzhou@ebay.com> wrote:

> The goal of this job engine is that:
> Provide unified interface for all job execution, query.
> Here job can be for example Kylin query, Building Cube, GC etc.
> As the old job engine is hard to support jobs other than Building Cube,
> I think it is mandatory before we introduce new realization of data model,
> such as inverted-index.
>
> Best Regard
> Zhou QianHao
>
>
>
>
>
> On 1/13/15, 3:42 PM, "Ted Dunning" <ted.dunning@gmail.com> wrote:
>
> >What is the goal of this job engine?
> >
> >To just run Kylin queries?
> >
> >
> >
> >On Tue, Jan 13, 2015 at 12:31 AM, Henry Saputra <henry.saputra@gmail.com>
> >wrote:
> >
> >> I believe we do not care about Spark client APIs for the distributed
> >> execution engine, so I would recommend to take a look also at Apache
> >> Flink [1].
> >>
> >> Similar to Spark, it has execution engine that could run standalone or
> >> on YARN as DAG.
> >> But since we want to focus mostly on backend, it has some special
> >> features like built-in iteration operator, heap memory management, and
> >> also cost optimizer for execution plan.
> >>
> >> - Henry
> >>
> >> [1] http://flink.apache.org/
> >>
> >> On Mon, Jan 12, 2015 at 10:17 PM, Li Yang <liyang@apache.org> wrote:
> >> > Agree. We shall proceed to refactor the job engine. It needs to be
> >>more
> >> > extensible and friendly to add new jobs and steps. This is a
> >>prerequisite
> >> > for Kylin to explore other opportunities for faster cube build, like
> >> Spark
> >> > and
> >> >
> >> > Please update with finer designs.
> >> >
> >> > On Fri, Jan 9, 2015 at 10:07 AM, ć‘šćƒæ˜Š <z.qianhao@gmail.com>
wrote:
> >> >
> >> >> Currently Kylin has its own Job Engine to schedule cubing process.
> >> However
> >> >> there are some demerits
> >> >> 1. It is too tightly couple with cubing process, thus cannot support
> >> other
> >> >> kind of jobs easily
> >> >> 2. It is hard to expand or to integrate with other techniques (for
> >> example
> >> >> Spark)
> >> >> Thus I have proposed a refactor for the current job engine.
> >> >> Below is the wiki page in Github
> >> >>
> https://github.com/KylinOLAP/Kylin/wiki/%5BProposal%5D-New-Job-Engine
> >> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message