kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billy Liu <billy...@apache.org>
Subject Re: How to use MR to build UHC dimensions
Date Wed, 04 Apr 2018 05:40:17 GMT
>From the metadata, we found the global dictionary was not created
successfully. The expected metadata should be like "dictionaries": [ {

  "column": "ORDER_ID",
  "builder": "org.apache.kylin.dict.GlobalDictionaryBuilder"
} ]

If you could reproduce this issue, please file an JIRA. It seems a bug
from frontend here.


With Warm regards

Billy Liu


2018-04-02 11:13 GMT+08:00 Fei Yi <yijianhui123@gmail.com>:
> Hi Billy,
> I those a dimension with 60,000,000 data, measure is
> count_distinct(order_id),
> when i add the column "order_id" as global dictionary,web ui prompt created
> successfully.
> but global dictionary column are not displayed on the web ui ,and there are
> no any errors in the log file.
>
> Thanks for your help
>
> this is the log:
>
> 2018-04-02 10:43:22,354 DEBUG [http-bio-7070-exec-6]
> controller.CubeController:1010 : Saving cube {
>   "name": "GLD_MR_TEST",
>   "model_name": "M_ORDER",
>   "description": "",
>   "dimensions": [
>     {
>       "name": "CALENDAR_DATE",
>       "table": "OD",
>       "column": "CALENDAR_DATE",
>       "normal": "true"
>     },
>     {
>       "name": "YEAR_MONTH",
>       "table": "OD",
>       "column": "YEAR_MONTH",
>       "normal": "true"
>     }
>   ],
>   "measures": [
>     {
>       "name": "_COUNT_",
>       "function": {
>         "expression": "COUNT",
>         "returntype": "bigint",
>         "parameter": {
>           "type": "constant",
>           "value": "1"
>         },
>         "configuration": {}
>       }
>     },
>     {
>       "name": "CD",
>       "function": {
>         "expression": "COUNT_DISTINCT",
>         "returntype": "bitmap",
>         "parameter": {
>           "type": "column",
>           "value": "FACT_ORDER_DETAIL.ORDER_ID"
>         }
>       },
>       "showDim": false
>     }
>   ],
>   "dictionaries": [],
>   "rowkey": {
>     "rowkey_columns": [
>       {
>         "column": "OD.CALENDAR_DATE",
>         "encoding": "dict",
>         "isShardBy": "false",
>         "encoding_version": 1
>       },
>       {
>         "column": "OD.YEAR_MONTH",
>         "encoding": "dict",
>         "isShardBy": "false",
>         "encoding_version": 1
>       }
>     ]
>   },
>   "aggregation_groups": [
>     {
>       "includes": [
>         "OD.CALENDAR_DATE",
>         "OD.YEAR_MONTH"
>       ],
>       "select_rule": {
>         "hierarchy_dims": [],
>         "mandatory_dims": [
>           "OD.CALENDAR_DATE",
>           "OD.YEAR_MONTH"
>         ],
>         "joint_dims": []
>       }
>     }
>   ],
>   "mandatory_dimension_set_list": [],
>   "partition_date_start": 1514764800000,
>   "notify_list": [],
>   "hbase_mapping": {
>     "column_family": [
>       {
>         "name": "F1",
>         "columns": [
>           {
>             "qualifier": "M",
>             "measure_refs": [
>               "_COUNT_"
>             ]
>           }
>         ]
>       },
>       {
>         "name": "F2",
>         "columns": [
>           {
>             "qualifier": "M",
>             "measure_refs": [
>               "CD"
>             ]
>           }
>         ]
>       }
>     ]
>   },
>   "volatile_range": "0",
>   "retention_range": "0",
>   "status_need_notify": [
>     "ERROR",
>     "DISCARDED",
>     "SUCCEED"
>   ],
>   "auto_merge_time_ranges": [],
>   "engine_type": 2,
>   "storage_type": "2",
>   "override_kylin_properties": {}
> }
> 2018-04-02 10:43:22,356 DEBUG [http-bio-7070-exec-6]
> cachesync.CachedCrudAssist:190 : Saving CubeDesc at
> /cube_desc/GLD_MR_TEST.json
> 2018-04-02 10:43:22,359 DEBUG [pool-6-thread-1] cachesync.Broadcaster:113 :
> Servers in the cluster: [localhost:7070]
> 2018-04-02 10:43:22,359 DEBUG [pool-6-thread-1] cachesync.Broadcaster:123 :
> Announcing new broadcast to all: BroadcastEvent{entity=cube_desc,
> event=create, cacheKey=GLD_MR_TEST}
> 2018-04-02 10:43:22,361 DEBUG [http-bio-7070-exec-4]
> cachesync.Broadcaster:247 : Broadcasting CREATE, cube_desc, GLD_MR_TEST
> 2018-04-02 10:43:22,361 INFO  [http-bio-7070-exec-6] service.CubeService:211
> : New cube GLD_MR_TEST has 1 cuboids
> 2018-04-02 10:43:22,362 INFO  [http-bio-7070-exec-6] cube.CubeManager:219 :
> Creating cube 'dw_zyb-->GLD_MR_TEST' from desc 'GLD_MR_TEST'
> 2018-04-02 10:43:22,362 INFO  [http-bio-7070-exec-6] cube.CubeManager:297 :
> Updating cube instance 'GLD_MR_TEST'
> 2018-04-02 10:43:22,362 DEBUG [http-bio-7070-exec-6]
> cachesync.CachedCrudAssist:190 : Saving CubeInstance at
> /cube/GLD_MR_TEST.json
> 2018-04-02 10:43:22,362 DEBUG [http-bio-7070-exec-4]
> cachesync.Broadcaster:247 : Broadcasting UPDATE, project_schema, dw_zyb
> 2018-04-02 10:43:22,364 DEBUG [pool-6-thread-1] cachesync.Broadcaster:113 :
> Servers in the cluster: [localhost:7070]
> 2018-04-02 10:43:22,364 DEBUG [pool-6-thread-1] cachesync.Broadcaster:123 :
> Announcing new broadcast to all: BroadcastEvent{entity=cube, event=create,
> cacheKey=GLD_MR_TEST}
> 2018-04-02 10:43:22,365 DEBUG [http-bio-7070-exec-6]
> cachesync.CachedCrudAssist:190 : Saving ProjectInstance at
> /project/dw_zyb.json
> 2018-04-02 10:43:22,367 DEBUG [pool-6-thread-1] cachesync.Broadcaster:113 :
> Servers in the cluster: [localhost:7070]
> 2018-04-02 10:43:22,367 DEBUG [pool-6-thread-1] cachesync.Broadcaster:123 :
> Announcing new broadcast to all: BroadcastEvent{entity=project,
> event=update, cacheKey=dw_zyb}
> 2018-04-02 10:43:22,376 DEBUG [http-bio-7070-exec-4]
> project.ProjectL2Cache:195 : Loading L2 project cache for dw_zyb
> 2018-04-02 10:43:22,376 WARN  [http-bio-7070-exec-4]
> realization.RealizationRegistry:91 : No provider for realization type
> INVERTED_INDEX
> 2018-04-02 10:43:22,378 INFO  [http-bio-7070-exec-4]
> service.CacheService:120 : cleaning cache for project dw_zyb (currently
> remove all entries)
> 2018-04-02 10:43:22,378 DEBUG [http-bio-7070-exec-4]
> cachesync.Broadcaster:281 : Done broadcasting UPDATE, project_schema, dw_zyb
> 2018-04-02 10:43:22,378 DEBUG [http-bio-7070-exec-4]
> cachesync.Broadcaster:281 : Done broadcasting CREATE, cube_desc, GLD_MR_TEST
> 2018-04-02 10:43:22,381 DEBUG [http-bio-7070-exec-1]
> cachesync.Broadcaster:247 : Broadcasting CREATE, cube, GLD_MR_TEST
> 2018-04-02 10:43:22,383 DEBUG [http-bio-7070-exec-1]
> cachesync.Broadcaster:247 : Broadcasting UPDATE, project_data, dw_zyb
> 2018-04-02 10:43:22,383 INFO  [http-bio-7070-exec-1]
> service.CacheService:120 : cleaning cache for project dw_zyb (currently
> remove all entries)
> 2018-04-02 10:43:22,383 DEBUG [http-bio-7070-exec-1]
> cachesync.Broadcaster:281 : Done broadcasting UPDATE, project_data, dw_zyb
> 2018-04-02 10:43:22,383 DEBUG [http-bio-7070-exec-1]
> cachesync.Broadcaster:281 : Done broadcasting CREATE, cube, GLD_MR_TEST
> 2018-04-02 10:43:22,386 DEBUG [http-bio-7070-exec-1]
> cachesync.Broadcaster:247 : Broadcasting UPDATE, project, dw_zyb
> 2018-04-02 10:43:22,387 DEBUG [http-bio-7070-exec-1]
> project.ProjectL2Cache:195 : Loading L2 project cache for dw_zyb
> 2018-04-02 10:43:22,387 WARN  [http-bio-7070-exec-1]
> realization.RealizationRegistry:91 : No provider for realization type
> INVERTED_INDEX
> 2018-04-02 10:43:22,387 DEBUG [http-bio-7070-exec-1]
> cachesync.Broadcaster:247 : Broadcasting UPDATE, project_schema, dw_zyb
> 2018-04-02 10:43:22,402 DEBUG [http-bio-7070-exec-1]
> project.ProjectL2Cache:195 : Loading L2 project cache for dw_zyb
> 2018-04-02 10:43:22,402 WARN  [http-bio-7070-exec-1]
> realization.RealizationRegistry:91 : No provider for realization type
> INVERTED_INDEX
> 2018-04-02 10:43:22,404 INFO  [http-bio-7070-exec-1]
> service.CacheService:120 : cleaning cache for project dw_zyb (currently
> remove all entries)
> 2018-04-02 10:43:22,404 DEBUG [http-bio-7070-exec-1]
> cachesync.Broadcaster:281 : Done broadcasting UPDATE, project_schema, dw_zyb
> 2018-04-02 10:43:22,405 DEBUG [http-bio-7070-exec-1]
> cachesync.Broadcaster:247 : Broadcasting UPDATE, project_data, dw_zyb
> 2018-04-02 10:43:22,405 INFO  [http-bio-7070-exec-1]
> service.CacheService:120 : cleaning cache for project dw_zyb (currently
> remove all entries)
> 2018-04-02 10:43:22,405 DEBUG [http-bio-7070-exec-1]
> cachesync.Broadcaster:281 : Done broadcasting UPDATE, project_data, dw_zyb
> 2018-04-02 10:43:22,405 DEBUG [http-bio-7070-exec-1]
> cachesync.Broadcaster:281 : Done broadcasting UPDATE, project, dw_zyb
>
> 2018-04-01 23:23 GMT+08:00 Billy Liu <billyliu@apache.org>:
>>
>> Hi Fei Yi,
>>
>> This parameter only works for ultra high cardinality columns,
>> including the columns defined as "ShardBy" and "Global Dictionary".
>> Please check if your cube has these two definitions.
>>
>> With Warm regards
>>
>> Billy Liu
>>
>>
>> 2018-03-30 16:45 GMT+08:00 Fei Yi <yijianhui123@gmail.com>:
>> > I use kylin 2.3.1 version´╝î
>> > set kylin.engine.mr.build-uhc-dict-in-additional-step=true
>> > kylin.snapshot.max-mb=3000
>> >
>> > but job are still built in kylin server, I don't see a separate step to
>> > build UHC dimensions
>> >
>> >
>
>

Mime
View raw message