kylin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luke Han <luke...@gmail.com>
Subject Re: Need advice for kylin newbie
Date Mon, 02 Mar 2015 12:09:55 GMT
Hi Vikram,
    Would like to confirm this one: "Configuration of the cluster takes too
long."
    Do you mean setup Hadoop Cluster and Kylin Server? or just Kylin server?

    BTW, For Kylin installation, as Yang mentioned, please refer to new
version (v0.7.x) with binary package here:
http://kylin.incubator.apache.org/download/ Just updated with bug fixing
which blocked save cube previous. This package should be easy to setup on
Hadoop cluster even cloud.
    Please feel free to let's know if there's any issue.

    Thanks.

Luke


Best Regards!
---------------------

Luke Han

2015-03-02 18:40 GMT+08:00 Li Yang <liyang@apache.org>:

> Hi Vikram,
>
> Great thanks to your precious feedback!
>
> From your requirements, Kylin is actually a very good fit. However the
> installation is really painful we know, and is working hard to improve it.
> The new 0.7.x will have a binary release, should be able to unpack and run
> with Hortonworks and Cloudera distribution with zero configuration.
> For Azure...,
> well, haven't tried yet, but that's where you can help us.
>
> On the other side, we will be glad to help with your POC. If possible,
> share your (sample) data modal and query patterns. We can suggest the best
> cube design of Kylin.
>
>
> Cheers
> Yang
>
>
> On Sat, Feb 28, 2015 at 2:12 AM, Adunuthula, Seshu <sadunuthula@ebay.com>
> wrote:
>
> > Vikram,
> >
> > Thank you for a honest and direct feedback on Kylin. As you had rightly
> > called
> > out the sweetspot for Kylin is the ability to do MOLAP on 10-100 billions
> > of
> > Rows with sub second query responses. So we believe Kylin is the right
> > tools
> > for your requirements below.
> >
> > > So given these requirements, is Kylin the right solution to replace our
> > > on-premise MOLAP cubes?  As long as our users can pivot/slice & dice
> the
> > > measures quickly from client tools like excel ND tableau by dragging
> > > dropping dimensions into rows/columns w/o the need to join to fact
> table,
> >
> >
> >
> > Docker is useful for single machine Developer deployments and I have
> found
> > that
> > a certain level of Docker expertise is needed before you can successful
> > deploy
> > Them.
> >
> > You are doing a certain set of firsts that could be making your setup a
> > nightmare.
> > Using Azure as the managed Hadoop System would certainly be a first for
> > the Kylin
> > team and you might be running into.
> >
> > That said we "Kylin team" are interested in making your POC successful,
> > and as
> > with any open source there is some assembly required, are you as a team
> > setup
> > for Development activities? If so we can have team to team meetings to
> > determine
> > What takes to make the POC successful
> >
> > On 2/27/15, 8:51 AM, "Vikram Kone" <vikramkone@gmail.com> wrote:
> >
> > >Hi,
> > >I'm a newbie when it comes to Kylin and Hadoop eco system in general.
> Our
> > >team has been predominantly a Microsoft shop that uses MS stack for most
> > >of
> > >their BI needs. So we are talking SQL server  for storing relational
> data
> > >and SQL Server Analysis services for building MOLAP cubes for sub-second
> > >query analysis.
> > >Lately, we have been hitting degradation in our cube query response
> times
> > >as our data sizes grew considerably the past year. We are talking fact
> > >tables which are in 1o-100 billions of rows range and a few dimensions
> in
> > >the 10-100's of millions of rows. We tried vertically scaling up our
> SSAS
> > >server but queries are still taking few minutes. In light of this, I was
> > >entrusted with task of figuring out an open source solution that would
> > >scale to our current and future needs for data analysis.
> > >I looked at a bunch of open source tools like Apache Drill, Druid,
> > >AtScale,
> > >Spark, Storm, Kylin etc and settled on exploring kylin  as the first
> step
> > >given it's recent rise in popularity and growing eco-system around it.
> > >I started to build out a POC for our MOLAP cubes using kylin with
> > >HDFS/Hive
> > >as the datasource and see how it scales for our queries/measures in real
> > >time with real data. The setup has been a nightmare so far.
> Configuration
> > >of the cluster takes too long. I tried the docker version and it fails
> > >with
> > >cryptic errors. Then tried installing it using the build from root
> option
> > >on a hadopp cluster and seeing more issues while building issues related
> > >to
> > >cube building. Same with binary package installation. It's just taking
> too
> > >long to set up. There should be an easier way to do this :(
> > >Roughly, these are the requirements for our team
> > >1. Should be able to create facts, dimensions and measures from our data
> > >sets in an easier way.
> > >2. Cubes should be query able from Excel and Tableau.
> > >3. Easily scale out by adding new nodes when data grows
> > >4. Very less maintenance and highly stable for production level
> workloads
> > >5. Sub second query latencies for COUNT DISTINCT measures (since
> majority
> > >of our expensive measures are of this type) . Are ok with Approx
> Distinct
> > >counts for better perf.
> > >
> > >So given these requirements, is Kylin the right solution to replace our
> > >on-premise MOLAP cubes?  As long as our users can pivot/slice & dice the
> > >measures quickly from client tools like excel ND tableau by dragging
> > >dropping dimensions into rows/columns w/o the need to join to fact
> table,
> > >we are ok with however the data is laid out. Doesn't have to be a cube.
> It
> > >can be a flat file in hdfs for all we care. I would love to chat with
> some
> > >one who has successfully done this kind of migration from SSAS OLAP
> cubes
> > >to KYLIN  in their team or company AND learn about pros n cons before I
> > >spend more time Co figuring this stuff.
> > >
> > >This is it for now. Looking forward to a great discussion.
> > >
> > >P.S. We have decided on using Azure as our managed hadoop system in the
> > >cloud.
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message