kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ShaoFeng Shi <shaofeng...@apache.org>
Subject Re: Data disappears if hbase splits region
Date Tue, 08 Aug 2017 01:15:29 GMT
Thanks for the input. Did you enable any compression (e.g, LZO, Snappy) for
HBase?

2017-08-08 0:49 GMT+08:00 Alexander Sterligov <sterligovak@joom.it>:

> All parameters were default. I've found out that it is really related to
> size estimation of count distinct measure. F2 family were underestimated
> for about 4 times.
>
> After I set kylin.cube.size-estimate-countdistinct-ratio=0.2 estimations
> are good and it works much better.
>
> It looks like default value of 0.05 is too low for bitmap and global
> dictionary.
>
> Cube description is attached.
>
> On Mon, Aug 7, 2017 at 6:21 AM, ShaoFeng Shi <shaofengshi@apache.org>
> wrote:
>
>> Hi Alexander,
>>
>> Sometimes there will be over-estimation for the size if Cube has some
>> complex measure like count distinct and topn, but seldom heard of less
>> estimation. Did you change other parameters which may impact on the
>> estimation in kylin.properties? Besides, if you can share the Cube
>> definition, that would help (information like dimension/measure, rowkey
>> encoding will also impact on the region split).
>>
>> 2017-08-07 3:03 GMT+08:00 Alexander Sterligov <sterligovak@joom.it>:
>>
>>> I've found out that sharding is done manually, so running split in hbase
>>> shell breaks data.
>>>
>>> So the main problem is that region-cut doesn't work on hbase with s3. I
>>> see that in the log it creates shards properly:
>>>
>>> 2017-08-05 20:54:48,709 INFO  [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
>>> steps.CreateHTableJob:192 : Total size 21334.075368547456M (estimated)
>>> 2017-08-05 20:54:48,709 INFO  [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
>>> steps.CreateHTableJob:193 : Expecting 4 regions.
>>> 2017-08-05 20:54:48,709 INFO  [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892]
>>> steps.CreateHTableJob:194 : Expecting 5333 MB per region.
>>>
>>> But then I get single 20GB region.
>>>
>>> Did anyone had same behaviour?
>>>
>>> On Sun, Aug 6, 2017 at 8:15 PM, Alexander Sterligov <sterligovak@joom.it
>>> > wrote:
>>>
>>>> hi,
>>>>
>>>> I noticed very large hbase region for one segment (more than 20GB and
>>>> kylin.storage.hbase.region-cut-gb=5). I don't know why it is so large,
>>>> but anyway it degraded performance a lot, so I decided to split it in hbase.
>>>>
>>>> When the split has just started kylin started to return empty results
>>>> for queries to this segment.
>>>>
>>>> Why can that happen?
>>>>
>>>> PS
>>>> It seams to me that kylin.storage.hbase.region-cut-gb doesn't work in
>>>> case if external hbase cluster is used.
>>>>
>>>
>>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Mime
View raw message