kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tony Lee <btony...@gmail.com>
Subject Re: Error while building cube from stream
Date Mon, 26 Sep 2016 10:01:18 GMT
Thanks for you replying.

I have create an issue here.
https://issues.apache.org/jira/browse/KYLIN-2053


On Mon, Sep 26, 2016 at 4:59 PM, ShaoFeng Shi <shaofengshi@apache.org>
wrote:

> Hi Tony,
>
> You're correct; The global dictionary wasn't supported in stream builder
> (this is the first reporting); Could you please open a JIRA?
> https://issues.apache.org/jira/secure/Dashboard.jspa
>
> BTW, we're developing the new version of streaming engine, which will
> reuse most of the logic of batch cubing engine, planned to roll out in
> v1.6. I believe with the new design there will have no such issue.
>
> 2016-09-26 14:56 GMT+08:00 Tony Lee <btonylee@gmail.com>:
>
>> Thanks
>>
>> But this does not work on streaming cube.
>>
>> I read some code and found that in class *StreamingCubeBuilder,* the
>> dictionary map was built by *DictionaryGenerator.buildDictionary()*
>> instead of *DictionaryManager.buildDictionary()*. Does this mean that
>> streaming cube does not support global dictionary?
>>
>> I add USERID to the dimensions, then the cube was built successfully. But
>> I think the result will be incorrect if I calculate count distinct in
>> different segments. Is that right
>>
>>
>> Tony
>>
>> On Sat, Sep 24, 2016 at 10:29 PM, ShaoFeng Shi <shaofengshi@apache.org>
>> wrote:
>>
>>> Hi Tony,
>>>
>>> The error was occurred when building a bitmap counter (for distinct
>>> count); from your cube descriptor, it seems there is no global dictionary
>>> be specified for the user id column. Please check this blog:
>>> https://kylin.apache.org/blog/2016/08/01/count-distinct-in-kylin/
>>>
>>> 2016-09-22 10:49 GMT+08:00 Tony Lee <btonylee@gmail.com>:
>>>
>>>> Thanks, ShaoFeng Shi. That is the reason.
>>>>
>>>> But unfortunately, I have a new problem about count distinct
>>>> (precisely)
>>>>
>>>> I  added a streaming table on version 1.5.4 with my own json, which is
>>>> like this
>>>> {
>>>>     "logTimestamp":1474456891127,
>>>>     "datetime":"2016-09-21 19:21:31",
>>>>     "uploadTime":"20160921192023",
>>>>     "userId":"f2d28cbf9e21340a49e97063486db1f5",
>>>>     "accountId":"84108490",
>>>>     "otherfield":"...."
>>>> }
>>>>
>>>> *The error message while building the cube is*
>>>>
>>>> 2016-09-22 10:01:40,731 ERROR [main StreamingCLI:103]: error start
>>>> streaming
>>>> java.lang.RuntimeException: error build cube from StreamingBatch
>>>>         at org.apache.kylin.engine.streaming.cube.StreamingCubeBuilder.
>>>> build(StreamingCubeBuilder.java:105)
>>>>         at org.apache.kylin.engine.streaming.OneOffStreamingBuilder$1.r
>>>> un(OneOffStreamingBuilder.java:79)
>>>>         at org.apache.kylin.engine.streaming.cli.StreamingCLI.startOneO
>>>> ffCubeStreaming(StreamingCLI.java:123)
>>>>         at org.apache.kylin.engine.streaming.cli.StreamingCLI.main(Stre
>>>> amingCLI.java:97)
>>>> Caused by: java.lang.NullPointerException
>>>>         at org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(
>>>> BitmapMeasureType.java:100)
>>>>         at org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(
>>>> BitmapMeasureType.java:89)
>>>>         at org.apache.kylin.cube.inmemcubing.InMemCubeBuilderInputConve
>>>> rter.buildValueOf(InMemCubeBuilderInputConverter.java:122)
>>>>         at org.apache.kylin.cube.inmemcubing.InMemCubeBuilderInputConve
>>>> rter.buildValue(InMemCubeBuilderInputConverter.java:94)
>>>>         at org.apache.kylin.cube.inmemcubing.InMemCubeBuilderInputConve
>>>> rter.convert(InMemCubeBuilderInputConverter.java:70)
>>>>         at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder$InputConv
>>>> erter$1.next(InMemCubeBuilder.java:542)
>>>>         at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder$InputConv
>>>> erter$1.next(InMemCubeBuilder.java:523)
>>>>         at org.apache.kylin.gridtable.GTAggregateScanner.iterator(GTAgg
>>>> regateScanner.java:139)
>>>>         at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.createBas
>>>> eCuboid(InMemCubeBuilder.java:339)
>>>>         at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.build(InM
>>>> emCubeBuilder.java:166)
>>>>         at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.build(InM
>>>> emCubeBuilder.java:135)
>>>>         at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.build(InM
>>>> emCubeBuilder.java:122)
>>>>         at org.apache.kylin.cube.inmemcubing.AbstractInMemCubeBuilder$1
>>>> .run(AbstractInMemCubeBuilder.java:80)
>>>>         at java.util.concurrent.Executors$RunnableAdapter.call(Executor
>>>> s.java:471)
>>>>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>>> Executor.java:1145)
>>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>>> lExecutor.java:615)
>>>>         at java.lang.Thread.run(Thread.java:745)
>>>>
>>>>
>>>> *and the cube json is*
>>>> {
>>>>   "uuid": "db91bcea-b33f-48af-a2f5-6014b14031f4",
>>>>   "last_modified": 1474511879506,
>>>>   "version": "1.5.4",
>>>>   "name": "hot_play_c",
>>>>   "model_name": "hot_play_cube",
>>>>   "description": "",
>>>>   "null_string": null,
>>>>   "dimensions": [
>>>>     {
>>>>       "name": "DEFAULT.HOT_PLAY.HOUR_START",
>>>>       "table": "DEFAULT.HOT_PLAY",
>>>>       "column": "HOUR_START",
>>>>       "derived": null
>>>>     },
>>>>     {
>>>>       "name": "DEFAULT.HOT_PLAY.MINUTE_START",
>>>>       "table": "DEFAULT.HOT_PLAY",
>>>>       "column": "MINUTE_START",
>>>>       "derived": null
>>>>     }
>>>>   ],
>>>>   "measures": [
>>>>     {
>>>>       "name": "_COUNT_",
>>>>       "function": {
>>>>         "expression": "COUNT",
>>>>         "parameter": {
>>>>           "type": "constant",
>>>>           "value": "1",
>>>>           "next_parameter": null
>>>>         },
>>>>         "returntype": "bigint"
>>>>       },
>>>>       "dependent_measure_ref": null
>>>>     },
>>>>     {
>>>>       "name": "COUNT_DISTINCT_USER",
>>>>       "function": {
>>>>         "expression": "COUNT_DISTINCT",
>>>>         "parameter": {
>>>>           "type": "column",
>>>>           "value": "USERID",
>>>>           "next_parameter": null
>>>>         },
>>>>         "returntype": "bitmap"
>>>>       },
>>>>       "dependent_measure_ref": null
>>>>     }
>>>>   ],
>>>>   "dictionaries": [],
>>>>   "rowkey": {
>>>>     "rowkey_columns": [
>>>>       {
>>>>         "column": "HOUR_START",
>>>>         "encoding": "time",
>>>>         "isShardBy": false
>>>>       },
>>>>       {
>>>>         "column": "MINUTE_START",
>>>>         "encoding": "time",
>>>>         "isShardBy": false
>>>>       }
>>>>     ]
>>>>   },
>>>>   "hbase_mapping": {
>>>>     "column_family": [
>>>>       {
>>>>         "name": "F1",
>>>>         "columns": [
>>>>           {
>>>>             "qualifier": "M",
>>>>             "measure_refs": [
>>>>               "_COUNT_"
>>>>             ]
>>>>           }
>>>>         ]
>>>>       },
>>>>       {
>>>>         "name": "F2",
>>>>         "columns": [
>>>>           {
>>>>             "qualifier": "M",
>>>>             "measure_refs": [
>>>>               "COUNT_DISTINCT_USER"
>>>>             ]
>>>>           }
>>>>         ]
>>>>       }
>>>>     ]
>>>>   },
>>>>   "aggregation_groups": [
>>>>     {
>>>>       "includes": [
>>>>         "HOUR_START",
>>>>         "MINUTE_START"
>>>>       ],
>>>>       "select_rule": {
>>>>         "hierarchy_dims": [],
>>>>         "mandatory_dims": [],
>>>>         "joint_dims": []
>>>>       }
>>>>     }
>>>>   ],
>>>>   "signature": "QXddyWCVVCYQcozxd4Zh2w==",
>>>>   "notify_list": [],
>>>>   "status_need_notify": [
>>>>     "ERROR",
>>>>     "DISCARDED",
>>>>     "SUCCEED"
>>>>   ],
>>>>   "partition_date_start": 0,
>>>>   "partition_date_end": 3153600000000,
>>>>   "auto_merge_time_ranges": [
>>>>     604800000,
>>>>     2419200000
>>>>   ],
>>>>   "retention_range": 0,
>>>>   "engine_type": 2,
>>>>   "storage_type": 2,
>>>>   "override_kylin_properties": {}
>>>> }
>>>>
>>>> *no error after i change the returntype to hllc(16)*
>>>>
>>>> *i have struggled for several days. Any hints about this?*
>>>>
>>>> On Wed, Sep 21, 2016 at 10:47 PM, ShaoFeng Shi <shaofengshi@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Tony,
>>>>>
>>>>> It seems your cube isn't partitioned (no partition date column
>>>>> specified); please check or provide the cube JSON.
>>>>>
>>>>> 2016-09-21 0:30 GMT+08:00 Alberto Ramón <a.ramonportoles@gmail.com>:
>>>>>
>>>>>> I don't know but , can you check this change?: KYLIN-1744
>>>>>> <https://issues.apache.org/jira/browse/KYLIN-1744> in V1.3
>>>>>>
>>>>>>
>>>>>> 2016-09-20 14:50 GMT+02:00 Tony Lee <btonylee@gmail.com>:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I was building cube from stream as the document(
>>>>>>> http://kylin.apache.org/docs15/tutorial/cube_streaming.html
>>>>>>>
>>>>>>> ) says.
>>>>>>>
>>>>>>> I was using 1.5.3, and i encounter this error. Same error on
1.5.4.
>>>>>>> Everything fine on 1.5.2.1.
>>>>>>>
>>>>>>> Any idea how to solve this?
>>>>>>>
>>>>>>>
>>>>>>> 2016-09-20 20:31:51,520 INFO  [main KafkaStreamingInput:129]:
finish
>>>>>>> to get streaming batch, total message count:30
>>>>>>> 2016-09-20 20:31:51,532 DEBUG [main CubeManager:855]: Reloaded
new
>>>>>>> cube: STREAMING_CUBE with reference beingCUBE[name=STREAMING_CUBE]
having 1
>>>>>>> segments:KYLIN_2822I1W3CX
>>>>>>> 2016-09-20 20:31:51,536 INFO  [main CubeManager:314]: Updating
cube
>>>>>>> instance 'STREAMING_CUBE'
>>>>>>> 2016-09-20 20:31:51,538 WARN  [main StreamingCLI:127]: invalid
>>>>>>> args:streaming start STREAMING_CUBE 1474374540000_1474374600000
-start
>>>>>>> 1474374540000 -end 1474374600000 -cube STREAMING_CUBE
>>>>>>> 2016-09-20 20:31:51,539 ERROR [main StreamingCLI:103]: error
start
>>>>>>> streaming
>>>>>>> java.lang.IllegalStateException: Segments overlap:
>>>>>>> STREAMING_CUBE[FULL_BUILD] and STREAMING_CUBE[FULL_BUILD]
>>>>>>> at org.apache.kylin.cube.CubeValidator.validate(CubeValidator.j
>>>>>>> ava:85)
>>>>>>> at org.apache.kylin.cube.CubeManager.updateCubeWithRetry(CubeMa
>>>>>>> nager.java:358)
>>>>>>> at org.apache.kylin.cube.CubeManager.updateCube(CubeManager.jav
>>>>>>> a:301)
>>>>>>> at org.apache.kylin.cube.CubeManager.appendSegment(CubeManager.
>>>>>>> java:441)
>>>>>>> at org.apache.kylin.engine.streaming.cube.StreamingCubeBuilder.
>>>>>>> createBuildable(StreamingCubeBuilder.java:118)
>>>>>>> at org.apache.kylin.engine.streaming.OneOffStreamingBuilder$1.r
>>>>>>> un(OneOffStreamingBuilder.java:76)
>>>>>>> at org.apache.kylin.engine.streaming.cli.StreamingCLI.startOneO
>>>>>>> ffCubeStreaming(StreamingCLI.java:123)
>>>>>>> at org.apache.kylin.engine.streaming.cli.StreamingCLI.main(Stre
>>>>>>> amingCLI.java:97)
>>>>>>> 2016-09-20 20:31:51,543 INFO  [Thread-0
>>>>>>> ConnectionManager$HConnectionImplementation:1678]: Closing
>>>>>>> zookeeper sessionid=0x35708fbc2740013
>>>>>>> 2016-09-20 20:31:51,549 INFO  [Thread-0 ZooKeeper:684]: Session:
>>>>>>> 0x35708fbc2740013 closed
>>>>>>> 2016-09-20 20:31:51,549 INFO  [main-EventThread ClientCnxn:512]:
>>>>>>> EventThread shut down
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>>
>>>>> Shaofeng Shi 史少锋
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋
>>>
>>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>

Mime
View raw message