kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Yang <liy...@apache.org>
Subject Re: Re: How Kylin Cuboid Scheduler Work With Aggregation Groups ?
Date Fri, 07 Apr 2017 02:49:52 GMT
Shaofeng is right. Check out details --
https://issues.apache.org/jira/browse/KYLIN-1749

On Wed, Apr 5, 2017 at 9:45 AM, ShaoFeng Shi <shaofengshi@apache.org> wrote:

> Hi bingli,
>
> I didn't try a agg group with only 1 dimension; please check whether
> removing the three single dim group to see whether it can work. Anyway,
> this is a bug I think.
>
> Regarding "precicely define combination with agg group", yes it is doable
> with agg group; say if you only want to use the combination ABCD, you can
> make them into a group, and then mark all these 4 as "mandatory", then for
> this group, only 1 cuboid will be calculated (otherwise will be 16). While,
> in older Kylin versions, this isn't allowed, so you need configure "
> kylin.cube.aggrgroup.isMandatoryOnlyValid=true" in kylin.properties.
>
> 2017-04-05 9:24 GMT+08:00 bingli3@iflytek.com <bingli3@iflytek.com>:
>
>>  你好,李杨:
>>     为什么kylin 最终解析的cuboids 与 我通过页面设计的不一致。这是不是
aggregation groups的一个BUG?
>>
>>     不一致,侧面验证是执行如下查询语句报错误,错误详见附件:
>>         select ts_hour, sum(request)
>>         from view_flow_insight
>>         group by ts_hour
>>
>>
>>     你给的文章我早先也拜读过,另外《Apache Kylin 权威指南》一书中指出:“聚合组的设计非常灵活,甚至可以用来描
>> 述一些极端的设计。假设我们的业务需求非常单一,只需要
>> 某些特定的Cuboid,那么可以创建多个聚合组,每个聚合组代表一个Cuboid,....................
>> ..........”。根据以上资料,我设计了符合我业务需求的 Cube(由于展示层使用superset,无法
>> 使用多表,所以只能使用视图转成一张表),最终存在一些 cuboid无法查询。
>>
>> ------------------------------
>> bingli3@iflytek.com
>>
>>
>> *From:* Li Yang <liyang@apache.org>
>> *Date:* 2017-04-04 17:26
>> *To:* user <user@kylin.apache.org>
>> *CC:* ShaoFeng?Shi <shaofengshi@apache.org>
>> *Subject:* Re: Re: How Kylin Cuboid Scheduler Work With Aggregation
>> Groups ?
>> Google "Kylin aggregation group" and the first result is:
>> http://kylin.apache.org/blog/2016/02/18/new-aggregation-group/
>>
>> On Mon, Apr 3, 2017 at 12:03 PM, bingli3@iflytek.com <bingli3@iflytek.com
>> > wrote:
>>
>>> 你好,少峰:
>>>     kylin cube中,无论是使用 aggregation group还是其他cube优化策略,最终得到的都是一系列组合(如:<day_time,
>>> gender>),而这些组合实际上是与 cuboid 唯一对应的。
>>>     在使用sql查询的时候,如果没有对应的 cuobid,那么查询是失败的(排除
extend、derived的维度组合)。
>>>
>>>     下图是,apache kylin官网对 aggregation group的解析。按同样的规则,在上封邮件中定义的Cube,
>>> 应该只会产生10种维度组合,即
>>>        <day_time, gender> 576
>>>        <day_time, age> 544
>>>        <day_time, brand> 528
>>>        <day_time, model> 520
>>>        <day_time, resolution> 516
>>>        <day_time, os_version> 514
>>>        <day_time, ntt>        513
>>>        <ts_minute>     256
>>>        <ts_hour>     128
>>>        <day_time>   512
>>>      对应的 cuboid 为后面的数字。从cube_statistics任务中看,最后只有
1023,516,576,513,528,
>>> 514,544,520 这些组合(查询Hbase Meta表也是这种情况)。
>>>
>>>      在 《Apache Kylin 权威指南》一书中,有介绍在一些极端情况下(如:precisely
define the
>>> cuboids/combinations) aggregation group 的使用方法。
>>>      所以,我以为目前 kylin 是支持这种定义方法的。
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------
>>> bingli3@iflyek.com
>>>
>>>
>>> *From:* ShaoFeng Shi <shaofengshi@apache.org>
>>> *Date:* 2017-04-02 22:17
>>> *To:* user <user@kylin.apache.org>
>>> *Subject:* Re: How Kylin Cuboid Scheduler Work With Aggregation Groups ?
>>> Hi Bing,
>>>
>>> An aggregation group is a dimension group, or say a sub-cube; it is NOT
>>> a cuboid.
>>>
>>> I guess you want to precisely define the cuboids/combinations, that
>>> isn't supported as in many cases user couldn't list all the combinations
>>> they use. But you can describe them with the agg group / mandatory / joint
>>> as close as possible.
>>>
>>> 2017-03-31 15:49 GMT+08:00 bingli3@iflytek.com <bingli3@iflytek.com>:
>>>
>>>>   Hi,all
>>>>       I have a Cube, the desc is :
>>>>
>>>> {
>>>>   "uuid": "bcf11be2-83e4-497e-9e35-a402460a6446",
>>>>   "last_modified": 1490860973892,
>>>>   "version": "1.6.0",
>>>>   "name": "adx_flow_insight",
>>>>   "model_name": "adx_operator",
>>>>   "description": "",
>>>>   "null_string": null,
>>>>   "dimensions": [
>>>>     {
>>>>       "name": "GENDER",
>>>>       "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>>>>       "column": "GENDER",
>>>>       "derived": null
>>>>     },
>>>>     {
>>>>       "name": "AGE",
>>>>       "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>>>>       "column": "AGE",
>>>>       "derived": null
>>>>     },
>>>>     {
>>>>       "name": "BRAND",
>>>>       "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>>>>       "column": "BRAND",
>>>>       "derived": null
>>>>     },
>>>>     {
>>>>       "name": "MODEL",
>>>>       "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>>>>       "column": "MODEL",
>>>>       "derived": null
>>>>     },
>>>>     {
>>>>       "name": "RESOLUTION",
>>>>       "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>>>>       "column": "RESOLUTION",
>>>>       "derived": null
>>>>     },
>>>>     {
>>>>       "name": "OS_VERSION",
>>>>       "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>>>>       "column": "OS_VERSION",
>>>>       "derived": null
>>>>     },
>>>>     {
>>>>       "name": "NTT",
>>>>       "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>>>>       "column": "NTT",
>>>>       "derived": null
>>>>     },
>>>>     {
>>>>       "name": "TS_MINUTE",
>>>>       "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>>>>       "column": "TS_MINUTE",
>>>>       "derived": null
>>>>     },
>>>>     {
>>>>       "name": "TS_HOUR",
>>>>       "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>>>>       "column": "TS_HOUR",
>>>>       "derived": null
>>>>     },
>>>>     {
>>>>       "name": "DAY_TIME",
>>>>       "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>>>>       "column": "DAY_TIME",
>>>>       "derived": null
>>>>     }
>>>>   ],
>>>>   "measures": [
>>>>     {
>>>>       "name": "_COUNT_",
>>>>       "function": {
>>>>         "expression": "COUNT",
>>>>         "parameter": {
>>>>           "type": "constant",
>>>>           "value": "1",
>>>>           "next_parameter": null
>>>>         },
>>>>         "returntype": "bigint"
>>>>       },
>>>>       "dependent_measure_ref": null
>>>>     },
>>>>     {
>>>>       "name": "REQUEST_PV",
>>>>       "function": {
>>>>         "expression": "SUM",
>>>>         "parameter": {
>>>>           "type": "column",
>>>>           "value": "REQUEST",
>>>>           "next_parameter": null
>>>>         },
>>>>         "returntype": "bigint"
>>>>       },
>>>>       "dependent_measure_ref": null
>>>>     },
>>>>     {
>>>>       "name": "IMPRESS_PV",
>>>>       "function": {
>>>>         "expression": "SUM",
>>>>         "parameter": {
>>>>           "type": "column",
>>>>           "value": "IMPRESS",
>>>>           "next_parameter": null
>>>>         },
>>>>         "returntype": "bigint"
>>>>       },
>>>>       "dependent_measure_ref": null
>>>>     },
>>>>     {
>>>>       "name": "CLICK_PV",
>>>>       "function": {
>>>>         "expression": "SUM",
>>>>         "parameter": {
>>>>           "type": "column",
>>>>           "value": "CLICK",
>>>>           "next_parameter": null
>>>>         },
>>>>         "returntype": "bigint"
>>>>       },
>>>>       "dependent_measure_ref": null
>>>>     },
>>>>     {
>>>>       "name": "FILL_PV",
>>>>       "function": {
>>>>         "expression": "SUM",
>>>>         "parameter": {
>>>>           "type": "column",
>>>>           "value": "FILL",
>>>>           "next_parameter": null
>>>>         },
>>>>         "returntype": "bigint"
>>>>       },
>>>>       "dependent_measure_ref": null
>>>>     },
>>>>     {
>>>>       "name": "UV_DID",
>>>>       "function": {
>>>>         "expression": "COUNT_DISTINCT",
>>>>         "parameter": {
>>>>           "type": "column",
>>>>           "value": "DID",
>>>>           "next_parameter": null
>>>>         },
>>>>         "returntype": "hllc(10)"
>>>>       },
>>>>       "dependent_measure_ref": null
>>>>     }
>>>>   ],
>>>>   "dictionaries": [],
>>>>   "rowkey": {
>>>>     "rowkey_columns": [
>>>>       {
>>>>         "column": "DAY_TIME",
>>>>         "encoding": "date",
>>>>         "isShardBy": false
>>>>       },
>>>>       {
>>>>         "column": "TS_MINUTE",
>>>>         "encoding": "integer:4",
>>>>         "isShardBy": false
>>>>       },
>>>>       {
>>>>         "column": "TS_HOUR",
>>>>         "encoding": "integer:4",
>>>>         "isShardBy": false
>>>>       },
>>>>       {
>>>>         "column": "GENDER",
>>>>         "encoding": "dict",
>>>>         "isShardBy": false
>>>>       },
>>>>       {
>>>>         "column": "AGE",
>>>>         "encoding": "dict",
>>>>         "isShardBy": false
>>>>       },
>>>>       {
>>>>         "column": "BRAND",
>>>>         "encoding": "dict",
>>>>         "isShardBy": false
>>>>       },
>>>>       {
>>>>         "column": "MODEL",
>>>>         "encoding": "dict",
>>>>         "isShardBy": false
>>>>       },
>>>>       {
>>>>         "column": "RESOLUTION",
>>>>         "encoding": "dict",
>>>>         "isShardBy": false
>>>>       },
>>>>       {
>>>>         "column": "OS_VERSION",
>>>>         "encoding": "dict",
>>>>         "isShardBy": false
>>>>       },
>>>>       {
>>>>         "column": "NTT",
>>>>         "encoding": "dict",
>>>>         "isShardBy": false
>>>>       }
>>>>     ]
>>>>   },
>>>>   "hbase_mapping": {
>>>>     "column_family": [
>>>>       {
>>>>         "name": "F1",
>>>>         "columns": [
>>>>           {
>>>>             "qualifier": "M",
>>>>             "measure_refs": [
>>>>               "_COUNT_",
>>>>               "REQUEST_PV",
>>>>               "IMPRESS_PV",
>>>>               "CLICK_PV",
>>>>               "FILL_PV"
>>>>             ]
>>>>           }
>>>>         ]
>>>>       },
>>>>       {
>>>>         "name": "F2",
>>>>         "columns": [
>>>>           {
>>>>             "qualifier": "M",
>>>>             "measure_refs": [
>>>>               "UV_DID"
>>>>             ]
>>>>           }
>>>>         ]
>>>>       }
>>>>     ]
>>>>   },
>>>>   "aggregation_groups": [
>>>>     {
>>>>       "includes": [
>>>>         "DAY_TIME",
>>>>         "GENDER"
>>>>       ],
>>>>       "select_rule": {
>>>>         "hierarchy_dims": [],
>>>>         "mandatory_dims": [
>>>>           "DAY_TIME"
>>>>         ],
>>>>         "joint_dims": []
>>>>       }
>>>>     },
>>>>     {
>>>>       "includes": [
>>>>         "DAY_TIME",
>>>>         "AGE"
>>>>       ],
>>>>       "select_rule": {
>>>>         "hierarchy_dims": [],
>>>>         "mandatory_dims": [
>>>>           "DAY_TIME"
>>>>         ],
>>>>         "joint_dims": []
>>>>       }
>>>>     },
>>>>     {
>>>>       "includes": [
>>>>         "DAY_TIME",
>>>>         "BRAND"
>>>>       ],
>>>>       "select_rule": {
>>>>         "hierarchy_dims": [],
>>>>         "mandatory_dims": [
>>>>           "DAY_TIME"
>>>>         ],
>>>>         "joint_dims": []
>>>>       }
>>>>     },
>>>>     {
>>>>       "includes": [
>>>>         "DAY_TIME",
>>>>         "MODEL"
>>>>       ],
>>>>       "select_rule": {
>>>>         "hierarchy_dims": [],
>>>>         "mandatory_dims": [
>>>>           "DAY_TIME"
>>>>         ],
>>>>         "joint_dims": []
>>>>       }
>>>>     },
>>>>     {
>>>>       "includes": [
>>>>         "DAY_TIME",
>>>>         "RESOLUTION"
>>>>       ],
>>>>       "select_rule": {
>>>>         "hierarchy_dims": [],
>>>>         "mandatory_dims": [
>>>>           "RESOLUTION"
>>>>         ],
>>>>         "joint_dims": []
>>>>       }
>>>>     },
>>>>     {
>>>>       "includes": [
>>>>         "DAY_TIME",
>>>>         "OS_VERSION"
>>>>       ],
>>>>       "select_rule": {
>>>>         "hierarchy_dims": [],
>>>>         "mandatory_dims": [
>>>>           "DAY_TIME"
>>>>         ],
>>>>         "joint_dims": []
>>>>       }
>>>>     },
>>>>     {
>>>>       "includes": [
>>>>         "DAY_TIME",
>>>>         "NTT"
>>>>       ],
>>>>       "select_rule": {
>>>>         "hierarchy_dims": [],
>>>>         "mandatory_dims": [
>>>>           "DAY_TIME"
>>>>         ],
>>>>         "joint_dims": []
>>>>       }
>>>>     },
>>>>     {
>>>>       "includes": [
>>>>         "TS_MINUTE"
>>>>       ],
>>>>       "select_rule": {
>>>>         "hierarchy_dims": [],
>>>>         "mandatory_dims": [
>>>>           "TS_MINUTE"
>>>>         ],
>>>>         "joint_dims": []
>>>>       }
>>>>     },
>>>>     {
>>>>       "includes": [
>>>>         "TS_HOUR"
>>>>       ],
>>>>       "select_rule": {
>>>>         "hierarchy_dims": [],
>>>>         "mandatory_dims": [
>>>>           "TS_HOUR"
>>>>         ],
>>>>         "joint_dims": []
>>>>       }
>>>>     }
>>>>   ],
>>>>   "signature": "DSSmByHn2sATiETlBdjANQ==",
>>>>   "notify_list": [],
>>>>   "status_need_notify": [
>>>>     "ERROR",
>>>>     "DISCARDED",
>>>>     "SUCCEED"
>>>>   ],
>>>>   "partition_date_start": 1488326400000,
>>>>   "partition_date_end": 3153600000000,
>>>>   "auto_merge_time_ranges": [
>>>>     604800000,
>>>>     2419200000
>>>>   ],
>>>>   "retention_range": 0,
>>>>   "engine_type": 2,
>>>>   "storage_type": 2,
>>>>   "override_kylin_properties": {
>>>>     "kylin.job.mr.config.override.mapreduce.job.queuename": "ad"
>>>>   }
>>>>
>>>> }
>>>>
>>>>    There have 10 dims, and use aggregation groups. I want Cube only
>>>> contains 10 combs:
>>>>        <day_time, gender> 576
>>>>        <day_time, age> 544
>>>>        <day_time, brand> 528
>>>>        <day_time, model> 520
>>>>        <day_time, resolution> 516
>>>>        <day_time, os_version> 514
>>>>        <day_time, ntt>        513
>>>>        <ts_minute>     256
>>>>        <ts_hour>     128
>>>>        <day_time>   512
>>>>
>>>>      But the Cuboid Scheduler parse as follower:
>>>>
>>>> 2017-03-31 15:32:47,735 (main) [INFO - org..apache.kylin.cub
>>>> e.CubeManager.loadAllCubeInstance(CubeManager.java:908)] Loa
>>>> ded 4 cubes, fail on 0 cubes
>>>> 1023
>>>> 516
>>>> 576
>>>> 513
>>>> 528
>>>> 514
>>>> 544
>>>> 520
>>>> 2017-03-31 15:32:47,742 (Thread-0) [INFO - org.apache.hadoop
>>>> .hbase.client.ConnectionManager$HConnectionImplementation.cl
>>>> oseMasterService(ConnectionManager.java:2259)] Closing maste
>>>> r protocol: MasterService
>>>>
>>>> Question:
>>>>      How Aggregation Groups  Work? I can not set single dim in
>>>> aggregation?
>>>>
>>>> Thanks for you suggestion~~~
>>>>
>>>> ------------------------------
>>>> bingli3@iflytek..com <bingli3@iflytek.com>
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋
>>>
>>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>

Mime
View raw message