kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hit-lacus <hit_la...@126.com>
Subject 回复:Re: Problem with Cube
Date Sun, 23 Jun 2019 05:16:14 GMT
Hi,
   It looks like it is caused by data skew, which offten happen in many big data scene. As
far as I know, I think you should check the high cardinality colmun and use it as a "Shard
By" column (in "Advanced Setting" of cube design stage). You may check "Redistribute intermediate
table" in http://kylin.apache.org/docs20/howto/howto_optimize_build.html for more information.
   If you find anything wrong or I misunderstand anything, please let me know. Thank you.






-----------------
-----------------
Best wishes to you ! 
From :Xiaoxiang Yu

At 2019-06-22 02:33:56, "Cinto Sunny" <cinto.sunny47@gmail.com> wrote:

Thanks. We actually have 12 reducers. The problem is that one reducer is getting stuck with
huge data. The rest completes. We have a 1.8 billion dsids and not sure if that is problem.
If yes, how do we distribute the data


- Cinto




On Fri, Jun 21, 2019 at 12:03 AM Chao Long <chao.long0101@gmail.com> wrote:

Hi Cinto Sunny,
   You can try to set "kylin.engine.mr.uhc-reducer-count" a bigger value, default is 1.


On Fri, Jun 21, 2019 at 2:44 PM Cinto Sunny <cinto.sunny47@gmail.com> wrote:

Hi All,


I am building a cube with 10 dimensions and two measures. The total input size is 100 GB.

I am trying to build using Roaring BitMap. One of the fact is user and has ~1.8B userids.



The build is getting stuck at stage - Extract Fact Table Distinct Columns. One executor is
stuck and is processing over 800M lines.


I am using version - 2.6.


Any pointers would be appreciated. Let me know is any further information is required.


- Cinto
Mime
View raw message