kylin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luke Han <luke...@apache.org>
Subject Re: Proposal for new Job Engine
Date Tue, 13 Jan 2015 13:31:26 GMT
Job Engine actually is "cube builder" which coordinate all different jobs
including MR jobs, shell scripts, java calls to manipulate data and
generate target cube in HBase. Whatever frameworks using, there's still one
"coordinator" required in Kylin to manage all the flows.

There's no "schedule job" feature in Kylin so that we call it as
lightweight engine. In most of the cases, cube build job will be triggered
from other system as last step of ETL. For example, once data be pupated in
hive by ETL process, the ETL workflow should have one step to trigger Kylin
cube build job via REST API.

Qianhao's proposal is to abstract job engine to be more generic so that it
could be easy to extend for new storage build logical, like InvertedIndex,
and also to adopt other options if necessary.

Thanks.
Luke



2015-01-13 17:59 GMT+08:00 Amarnath Arsikere <amar@infoworks.io>:

> Why not use Apache oozie instead. Even though you may not need complex
> workflows; you can gain some operational benefits and flexibility like say;
> Increasing memory for a hadoop job or monitoring the job on a web console.
>
> Regards,
> Amar
>
>
>
> On Tue, Jan 13, 2015 at 3:04 PM, Zhou, Qianhao <qianzhou@ebay.com> wrote:
>
> > What we want is that:
> >
> > 1. A lightweight job engine, easy to start, stop and check jobs
> >    Because most of the heavyweight job is map-reduce which is already
> >    running on the cluster, so we don’t need the job engine to run on a
> > cluster.
> >
> > 2. Kylin already has a job engine based on Quartz, however, only a very
> > small
> >    part of functionalities are used, so we can easily replace it with
> >    standard java api.
> >    Thus there will be no extra dependency which means easier to deploy.
> >
> > Currently a very simple job engine implementation will meet the kylin’s
> > needs.
> > So I think at this timing just keep it simple would be the better choice.
> >
> >
> > Best Regard
> > Zhou QianHao
> >
> >
> >
> >
> >
> > On 1/13/15, 4:43 PM, "Ted Dunning" <ted.dunning@gmail.com> wrote:
> >
> > >So why are the following systems unsuitable?
> > >
> > >- mesos + (aurora or chronos)
> > >- spark
> > >- yarn
> > >- drill's drillbits
> > >
> > >These options do different things.  I know that.  I am not entirely
> clear
> > >on what you want, however, so I present these different options so that
> > >you
> > >can tell me better what you want.
> > >
> > >Mesos provides very flexible job scheduling.  With Aurora, it has
> support
> > >for handling long-running and periodic jobs.  With Chronos, it has the
> > >equivalent of a cluster level cron.
> > >
> > >Spark provides the ability for a program to spawn lots of parallel
> > >execution.  This is different than what most people mean by job
> > >scheduling,
> > >but in conjunction with a queuing system combined with spark streaming,
> > >you
> > >can get remarkably close to a job scheduler.
> > >
> > >Yarn can run jobs, but has no capabilities to schedule recurring jobs.
> It
> > >can adjudicate the allocation of cluster resources.  This is different
> > >from
> > >what either spark or mesos does.
> > >
> > >Drill's drillbits do scheduling of queries across a parallel execution
> > >environment.  It currently has no user impersonation, but does do an
> > >interesting job of scheduling parts of parallel queries.
> > >
> > >Each of these could be considered like a job scheduler.  Only a very few
> > >are likely to be what you are talking about.
> > >
> > >Which is it?
> > >
> > >
> > >
> > >
> > >On Tue, Jan 13, 2015 at 1:53 AM, Zhou, Qianhao <qianzhou@ebay.com>
> wrote:
> > >
> > >> The goal of this job engine is that:
> > >> Provide unified interface for all job execution, query.
> > >> Here job can be for example Kylin query, Building Cube, GC etc.
> > >> As the old job engine is hard to support jobs other than Building
> Cube,
> > >> I think it is mandatory before we introduce new realization of data
> > >>model,
> > >> such as inverted-index.
> > >>
> > >> Best Regard
> > >> Zhou QianHao
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On 1/13/15, 3:42 PM, "Ted Dunning" <ted.dunning@gmail.com> wrote:
> > >>
> > >> >What is the goal of this job engine?
> > >> >
> > >> >To just run Kylin queries?
> > >> >
> > >> >
> > >> >
> > >> >On Tue, Jan 13, 2015 at 12:31 AM, Henry Saputra
> > >><henry.saputra@gmail.com>
> > >> >wrote:
> > >> >
> > >> >> I believe we do not care about Spark client APIs for the
> distributed
> > >> >> execution engine, so I would recommend to take a look also at
> Apache
> > >> >> Flink [1].
> > >> >>
> > >> >> Similar to Spark, it has execution engine that could run standalone
> > >>or
> > >> >> on YARN as DAG.
> > >> >> But since we want to focus mostly on backend, it has some special
> > >> >> features like built-in iteration operator, heap memory management,
> > >>and
> > >> >> also cost optimizer for execution plan.
> > >> >>
> > >> >> - Henry
> > >> >>
> > >> >> [1] http://flink.apache.org/
> > >> >>
> > >> >> On Mon, Jan 12, 2015 at 10:17 PM, Li Yang <liyang@apache.org>
> wrote:
> > >> >> > Agree. We shall proceed to refactor the job engine. It needs
to
> be
> > >> >>more
> > >> >> > extensible and friendly to add new jobs and steps. This is
a
> > >> >>prerequisite
> > >> >> > for Kylin to explore other opportunities for faster cube
build,
> > >>like
> > >> >> Spark
> > >> >> > and
> > >> >> >
> > >> >> > Please update with finer designs.
> > >> >> >
> > >> >> > On Fri, Jan 9, 2015 at 10:07 AM, 周千昊 <z.qianhao@gmail.com>
> wrote:
> > >> >> >
> > >> >> >> Currently Kylin has its own Job Engine to schedule cubing
> process.
> > >> >> However
> > >> >> >> there are some demerits
> > >> >> >> 1. It is too tightly couple with cubing process, thus
cannot
> > >>support
> > >> >> other
> > >> >> >> kind of jobs easily
> > >> >> >> 2. It is hard to expand or to integrate with other techniques
> (for
> > >> >> example
> > >> >> >> Spark)
> > >> >> >> Thus I have proposed a refactor for the current job engine.
> > >> >> >> Below is the wiki page in Github
> > >> >> >>
> > >> https://github.com/KylinOLAP/Kylin/wiki/%5BProposal%5D-New-Job-Engine
> > >> >> >>
> > >> >>
> > >>
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message