kylin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "蒋旭" <>
Subject 回复:a few slides for Strata + Hadoop World London 2015
Date Sun, 03 May 2015 02:41:22 GMT
Hi Yang,
I have some questions about this deck.

1. Basically, grid table is "split data block by time" + "secondary block index", which is
more suitable for inverted index than data cube. As data cube is multi-dimension array and
timestamp is just one dimension, it's difficult to be spilt into block by timestamp.

2. Grid table is more suitable for small-size data in memory and it's unsuitable for large-size
data on disk. When the data size is very large, we have to keep large data block. For frequently
term that will exist in most data block, we almost have to scan all blocks. 

3. For hbase, the key point of optimization is reduce the scan range or skip the scan range.
The grid table is "full timestamp + term as rowkey & secondary block index" that have
to scan large range for big time range query. I suggest to adopt "coarse timestamp + term
+ fine timestamp" as rowkey design that is more useful to reduce and skip scan range.  

4. How about the design of TopN query on ultra high cardinality dimension? Will we support
it both in inverted index and in data cube? Do we keep the "ultra high cardinality dimension"
as dimension or metrics? 


------------------ 原始邮件 ------------------
发件人: "Li Yang";<>;
发送时间: 2015年5月2日(星期六) 上午8:20
收件人: "dev"<>; 

主题: a few slides for Strata + Hadoop World London 2015

Hi Luke

I created a few slides for Strata + Hadoop World London 2015 next week, see attached. Let's
see how they merge with previous deck.

Some should attach to related JIRA as design doc. I'll do it later.


  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message