kylin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "蒋旭" <jiangxu.ch...@qq.com>
Subject 回复:cube building VS cognos
Date Tue, 03 Mar 2015 00:24:14 GMT
Just like database schema design is critical to the database performance, the metadata design
is also critical to cube building and query performance.


Kylin has lots of optimization options in metadata: partial cube, dictionary encoding, hierarchy
dimension, derived dimension and etc. 


You'd better get the cardinality of all dimension in Kylin . Then design the metadata based
on dimension characteristics and query pattern.

Thanks
Jiang Xu

------------------ 原始邮件 ------------------
发件人: Luke Han <luke.hq@gmail.com>
发送时间: 2015年03月02日 19:58
收件人: dev@kylin.incubator.apache.org <dev@kylin.incubator.apache.org>
主题: Re: cube building VS cognos



There's one magic but very important concept called "partial cube". As Ted
also mentioned.
If you build a full cube, it will calculate all possible combination of
dimensions which will certainly exposed when you have more dimensions even
when you have one column contains high cardinality.

Kylin supports partial cube already, please update on "Advance settings" of
cube designer to tune "Aggregation Group".

And, for derived dimension, you actually just need put one column into
aggregation group which will reduced very much of final cube size. with
more other optimization rules applied, the final cube size will be well
controlled.
For example, we have many production cases contains more than 10+B rows and
10+ dimensions, most cube sizes are around several hundreds GB, 20~50% of
source Hive table size. (we also have some cubes have more big expansion
rate to serve extreme case).

Also, Kylin's main focus is to accelerate query performance with
pre-calculated result (cube), size is just one dimension. Could you also
please run some queries on both Cognos and Kylin? We would like to know the
query performance result.

Thank you very much.
Luke





Best Regards!
---------------------

Luke Han

2015-03-02 13:47 GMT+08:00 Ted Dunning <ted.dunning@gmail.com>:

> This sounds like cognos is actually just building a few, possibly just one,
> of the more detailed cubes expecting that queries will roll these up to get
> effect of many of the less detailed cubes.  That is, it may not be building
> all of the cubes requested.
>
> Kylin, on the other hand, seems to be building up all requested cubes with
> no optimization being imposed.
>
> Note that these are my impressions based on seeing how new open source
> software often behaves relative to older, more established alternatives,
> not anything based on concrete information.  I look forward to being
> contradicted by facts.
>
>
>
>
> On Mon, Mar 2, 2015 at 3:59 AM, 王西斌 <bin890218@gmail.com> wrote:
>
> > Hi
> >
> > I've done some test for comparison with cognos which we used for olap.
> > Below is one case:
> >
> > Fact table size: 500M
> >
> > Dimension: 19( 16 derived, 3 normal)
> >
> > Measure: 56
> >
> > Cognos building this cube in about one hour with a 1.2G cube file.
> > While in kylin, cube size is already up to 64G with 10 Dimensions and 20
> > Measures. If i understand right, the cube size will double at least for
> > each additional dimension. As to the test case, cube size is estimated in
> > TB which is beyond our expectations greatly.
> > So, is there any test report we can refer to for comparison with
> > traditional olap tools like cognos, if so, please let me know, it will be
> > very helpful.
> >
> > Looking forward to reply.
> >
> > Thanks
> >
>
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message