kylin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 宋轶 <>
Subject RE: proposal of cube building optimization
Date Mon, 02 Mar 2015 05:47:38 GMT
Hi Xu,
Thanks for the proposal. But I don't quite understand your new algorithm.
1. The step #2 #3 you mentioned are the built-in features of MapReduce computation framework
(Partition, sort and spill). 2. I remember we've tried this one-stage method for cube building
in our POC phase. The problem of it is that each mapper will generate too much intermediate
data, and the network will be the bottleneck in Shuffle phase  
ThanksGeorge Song

> From:
> To:
> Subject: proposal of cube building optimization
> Date: Sat, 28 Feb 2015 11:24:59 +0800
> Hi Guys,
> Now the cube is built by multi-stage map-reduce job. It may introduce unnecessary latency
for some cases (e.g. incremental building). 
> We can introduce another cube building algorithm as below:
> 1. When the mapper process the raw record, it will generate all the valid combination
record that will be put into memory.
> 2. When memory is almost full, mapper will write all the combination records to reducer.

> 3. After mapper write the records to reduce, it will cleanup the memory for further process.
> Basically, mapper logically split the data block by memory limitation.
> Thanks
> Jiang Xu
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message