kylin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From JiaTao Tao <taojia...@gmail.com>
Subject Re: Re: Evaluate Kylin on Parquet
Date Wed, 19 Dec 2018 12:43:49 GMT
Hi all,

Truly agreed with Yiming, and here I expand a little more about
"Distributed computing".

As Yiming mentioned, Kylin will parse the query into an execution plan
using Calcite(Kylin will change the execution plan cuz the data in cubes is
already aggregated, we cannot use the origin plan directly). It's a tree
structure, a node represents a specific calculation and data goes from
bottom to top applying all these calculations.
[image: image.png]
(Pic from https://blog.csdn.net/yu616568/article/details/50838504, a really
good blog.)

At present, Kylin will do almost all these calculations only in its own
node, in other words, we cannot fully use the power of the cluster, and
it's a SPOF. And here comes a design that we can visit this tree, *and
transform each node into operations to Spark's Dataframes(i.e. "DF").*

More specifically, we will visit the nodes recursively until we met the
"TableScan" node(like a stack pushing operation). e.g. In the above
diagram, the first node we met is a "Sort" node, we just visit its
child(ren), and we'll not stop visiting each node's child(ren) until we met
a "TableScan" node.

In the "TableScan" node, we will generate the initial DF, then the DF will
be poped to the "Filter" node, and the "Filter" node will apply its own
operation like "df.filter(xxx)". Finally, we will apply each node's
operation to this DF, and the final call chain will like:
"df.filter(xxx).select(xxx).agg(xxx).sort(xxx)".

After we got the final Dataframe and triggered the calculation, all the
rest were handled by Spark. And we can gain tremendous benefits in
computation level, more details can be seen in my previous post:
http://apache-kylin.74782.x6.nabble.com/Re-DISCUSS-Columnar-storage-engine-for-Apache-Kylin-tc12113.html
.


-- 


Regards!

Aron Tao


许益铭 <x1860877@gmail.com> 于2018年12月19日周三 上午11:40写道:

> hi All!
> 关于CHAO LONG提到的几个问题,我有以下几个看法:
>
> 1.当前我们的架构是分为两层的,一层是storage层,一层是计算层.在storage层,我们已经做了一些优化,在storage层做了预聚合来减少返回的数据量,但是runtime的聚合和连接发生在kylin
> server端,序列化无可避免,且这个架构容易导致单点瓶颈,如果runtime
的agg或join数据量比较大的话,会导致查询性能直线下降,kylin
> server GC严重
>
>
> 2.关于字典问题,字典是当初为了在hbase中对齐rowkey,同时也为了减少一部分的存储而引入的设计.但这也引入另外一个问题,hbase很难处理非定长的string类型的dimension,如果遇到高基的非定长dimension,往往只能去建立一个很大的字典或者给一个比较大的fixlength,导致存储翻倍,同时因为字典比较大,查询性能会受到很大影响(gc).如果我们使用列式存储,是可以不需要考虑这个问题的.
>
> 3.我们要使用parquet的page
> index,必须把tuplefilter转换成parquet的filter,这个工作量不小.而且我们的数据都是被编码过的,parquet的page
> index只会根据page上的min max来进行过滤,因此对于binary的数据,是无法做filter的.
>
> 我觉得使用spark来做我们的计算引擎能解决上述所有问题:
>
> 1.分布式计算
> sql通过calcite解析优化之后会生成olap
>
> rel的一颗树,而spark的catalyst也是通过解析sql生成一棵树后,自动优化成为dataframe来计算,如果calcite的plan能够转换成spark的plan,那么我们将实现分布式计算,calcite只负责解析sql和返回结果集,减少kylin
> server端的压力.
>
> 2.去掉字典
>
> 字典有个很好的作用就是在中低基数下减少储存压力,但是也有一个坏处就是其数据文件无法脱离字典单独使用,我建议刚开始可以不考虑字典类型的encoding,让系统尽可能的简单,默认使用parquet的page级别的dictionary即可.
>
> 3.parquet存储使用列的真实类型,而不是使用binary
>
> 如上,parquet对于binary的filter能力极弱,而使用基本类型能够直接使用spark的Vectorizedread,加速数据读取速度和计算.
>
> 4.使用spark适配parquet
> 当前的spark已经适配了parquet,spark的pushed
> filter已经被转换成为了parquet能用的filter,这里只需要升级parquet版本后稍加修改就能提供parquet的page
> index能力.
>
> 5.index server
> 就如JiaTao Tao所述,index server分为file index 和 page index ,字典的过滤无非就是file
> index的一种,因为我们可以在这里插入一个index server.
>
>
> hi,all!
> I have the following views:
> 1. At present, our architecture is divided into two layers, one is the
> storage layer, and the other is the computing layer. In the storage layer,
> we have made some optimizations and do pre-aggregation in the storage layer
> to reduce the amount of data returned. However, the aggregation and
> connection of the runtime occurs on the kylin server side. Serialization is
> inevitable, and this architecture is easy to cause a single point
> bottleneck. If the agg or join data of the runtime is relatively large, the
> query performance will drop linearly, and the kylin server GC will be
> severe.
>
> 2. As for the dictionary problem, canceling dictionary encoding is a good
> choice. The dictionary was originally designed to align rowkey in hbase and
> also to reduce part of the storage. But this also introduces another
> problem, it is difficult to handle non-fixed string type dimension If you
> encounter a UHC dimension, you can only create a large dictionary or give a
> larger fix-length, which causes the storage to double, and because the
> dictionary is large, the query performance will be greatly affected. We use
> columnar storage, we don't need to consider this problem.
>
> 3. We need to use the page index of the parquet, we must convert the tuple
> filter into the filter of the parquet. This workload is not small. And our
> data is encoded. The page index of the parquet will only be based on the
> min and max value on the page. Filtering, so for binary data, it is
> impossible to do filter.
>
> I think using spark to do our calculation engine solves all of the above
> problems:
>
> Distributed computing
> Sql through calcite analysis optimization will generate a tree of OLAP rel,
> and spark's catalyst is also generated by parsing SQL after a tree,
> automatically optimized to become a dataframe to calculate, if the plan of
> calcite can be converted into a spark plan, then we will achieve
> distributed computing, calcite is only responsible for parsing SQL and
> returning result sets, reducing the pressure on the kylin server side.
>
> 2. Remove the dictionary
> The dictionary has a very good effect to reduce the storage pressure in the
> low and medium base, but there is also a disadvantage that its data files
> can not be used separately from the dictionary. I suggest that you can use
> the page level of the dictionary without considering the dictionary type
> encoding.
>
> 3.parquet storage uses the true type of the column instead of using binary
> As above, parquet has a very weak filter capability for binary, and the
> basic type can directly use spark's Vectorizedread to speed up data reading
> speed and calculation.
>
> 4. Use spark to match the parquet
> The current spark has been adapted to the parquet. The sparked filter of
> the spark has been converted into a filter that can be used by the parquet.
> Here, you only need to upgrade the version of the parcel and modify it to
> provide the page index of the parquet.
>
> 5.index server
> As described by JiaTao Tao, the index server is divided into file index and
> page index. The filtering of the dictionary is nothing but a file index,
> because we can insert an index server here.
>
> JiaTao Tao <taojiatao@gmail.com> 于2018年12月19日周三 下午4:45写道:
>
> > Hi Gang
> >
> > In my opinion, segments/partition pruning is actually in the scope of
> > "Index system", we can have an "Index system" in storage level including
> > File index(for segment/partition pruning), page index(for page pruning)
> > etc. We can put all these stuff in such a system and make the separation
> of
> > duties cleaner.
> >
> >
> > Ma Gang <mg4work@163.com> 于2018年12月19日周三 上午6:31写道:
> >
> > > Awesome! Looking forward to the improvement. For dictionary, keep the
> > > dictionary in query engine, most time is not good since it brings lots
> of
> > > pressure to Kylin server, but sometimes it has benefit, for example,
> some
> > > segments can be pruned very early when filter value is not in the
> > > dictionary, and some queries can be answer directly using dictionary as
> > > described in: https://issues.apache.org/jira/browse/KYLIN-3490
> > >
> > > At 2018-12-17 15:36:01, "ShaoFeng Shi" <shaofengshi@apache.org> wrote:
> > >
> > > The dimension dictionary is a legacy design for HBase storage I think;
> > > because HBase has no data type, everything is a byte array, this makes
> > > Kylin has to encode STRING and other types with some encoding method
> like
> > > the dictionary.
> > >
> > > Now with the storage like Parquet, it would decide how to encode the
> data
> > > at the page or block level. Then we can drop the dictionary after the
> > cube
> > > is built. This will release the memory pressure of Kylin query nodes
> and
> > > also benefit the UHC case.
> > >
> > > Best regards,
> > >
> > > Shaofeng Shi 史少锋
> > > Apache Kylin PMC
> > > Work email: shaofeng.shi@kyligence.io
> > > Kyligence Inc: https://kyligence.io/
> > >
> > > Apache Kylin FAQ:
> https://kylin.apache.org/docs/gettingstarted/faq.html
> > > Join Kylin user mail group: user-subscribe@kylin.apache.org
> > > Join Kylin dev mail group: dev-subscribe@kylin.apache.org
> > >
> > >
> > >
> > >
> > > Chao Long <wayne.l@qq.com> 于2018年12月17日周一 下午1:23写道:
> > >
> > >>  In this PoC, we verified Kylin On Parquet is viable, but the query
> > >> performance still have room to improve. We can improve it from the
> > >> following aspects:
> > >>
> > >>  1, Minimize result set serialization time
> > >>  Since Kylin need Object[] data to process, we convert Dataset to RDD,
> > >> and then convert the "Row" type to Object[], so Spark need to
> serialize
> > >> Object[] before return it to driver. Those time need to be avoided.
> > >>
> > >>  2, Query without dictionary
> > >>  In this PoC, for less storage use, we keep dict encode value in
> Parquet
> > >> file for dict-encode dimensions, so Kylin must load dictionary to
> > convert
> > >> dict value for query. If we keep original value for dict-encode
> > dimension,
> > >> dictionary is unnecessary. And we don't hava to worry about the
> storage
> > >> use, because Parquet will encode it. We should remove dictionary from
> > query.
> > >>
> > >>  3, Remove query single-point issue
> > >>  In this PoC, we use Spark to read and process Cube data, which is
> > >> distributed, but kylin alse need to process result data the Spark
> > returned
> > >> in single jvm. We can try to make it distributed too.
> > >>
> > >>  4, Upgrade Parquet to 1.11 for page index
> > >>  In this PoC, Parquet don't have page index, we get a poor filter
> > >> performance. We need to upgrade Parquet to version 1.11 which has page
> > >> index to improve filter performance.
> > >>
> > >> ------------------
> > >> Best Regards,
> > >> Chao Long
> > >>
> > >> ------------------ 原始邮件 ------------------
> > >> *发件人:* "ShaoFeng Shi"<shaofengshi@apache.org>;
> > >> *发送时间:* 2018年12月14日(星期五) 下午4:39
> > >> *收件人:* "dev"<dev@kylin.apache.org>;"user"<user@kylin.apache.org>;
> > >> *主题:* Evaluate Kylin on Parquet
> > >>
> > >> Hello Kylin users,
> > >>
> > >> The first version of Kylin on Parquet [1] feature has been staged in
> > >> Kylin code repository for public review and evaluation. You can check
> > out
> > >> the "kylin-on-parquet" branch [2] to read the code, and also can make
> a
> > >> binary build to run an example. When creating a cube, you can select
> > >> "Parquet" as the storage in the "Advanced setting" page. Both
> MapReduce
> > and
> > >> Spark engines support this new storage. A tech blog is under drafting
> > for
> > >> the design and implementation.
> > >>
> > >> Thanks so much to the engineers' hard work: Chao Long and Yichen Zhou!
> > >>
> > >> This is not the final version; there is room to improve in many
> aspects,
> > >> parquet, spark, and Kylin. It can be used for PoC at this moment. Your
> > >> comments are welcomed. Let's improve it together.
> > >>
> > >> [1] https://issues.apache.org/jira/browse/KYLIN-3621
> > >> [2] https://github.com/apache/kylin/tree/kylin-on-parquet
> > >>
> > >> Best regards,
> > >>
> > >> Shaofeng Shi 史少锋
> > >> Apache Kylin PMC
> > >> Work email: shaofeng.shi@kyligence.io
> > >> Kyligence Inc: https://kyligence.io/
> > >>
> > >> Apache Kylin FAQ:
> https://kylin.apache.org/docs/gettingstarted/faq.html
> > >> Join Kylin user mail group: user-subscribe@kylin.apache.org
> > >> Join Kylin dev mail group: dev-subscribe@kylin.apache.org
> > >>
> > >>
> > >>
> > >
> > >
> > >
> >
> >
> > --
> >
> >
> > Regards!
> >
> > Aron Tao
> >
>

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message