kylin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhou Kang <zhoukan...@outlook.com>
Subject Re: [DISCUSS] Cost-benefit of HBase scan result compression
Date Thu, 09 Jan 2020 02:41:33 GMT
( ̄▽ ̄)” Seems mail list disable rich text.
kylin sample data
+────────────────────+────────────────+───────────────────────────+───────────────────────────────+
| Query Result Size  | Compress Time  | Query Duration(Compress)  | Query Duration(Uncompressed)
 |
+────────────────────+────────────────+───────────────────────────+───────────────────────────────+
| 0.25M              | 5ms            | 0.18s                     | 0.23s                
        |
| 0.5M               | 20ms           | 0.38s                     | 0.38s                
        |
| 0.7M               | 25ms           | 0.52s                     | 0.45s                
        |
+────────────────────+────────────────+───────────────────────────+───────────────────────────────+

SSB data
+────────────────────+────────────────+───────────────────────────+───────────────────────────────+
| Query Result Size  | Compress Time  | Query Duration(Compress)  | Query Duration(Uncompressed)
 |
+────────────────────+────────────────+───────────────────────────+───────────────────────────────+
| 0.25M              | 4ms            | 0.12s                     | 0.15s                
        |
| 0.5M               | 7ms            | 0.25s                     | 0.24s                
        |
| 0.7M               | 10ms           | 0.35s                     | 0.35s                
        |
| 1M                 | 13ms           | 0.41s                     | 0.39s                
        |
| 5M                 | 63ms           | 2.26s                     | 2.27s                
        |
| 10M                | 135ms          | 5.10s                     | 4.90s                
        |
| 16M                | 215ms          | 7.89s                     | 7.60s                
        |
+────────────────────+────────────────+───────────────────────────+───────────────────────────────+
发件人: Zhou Kang <zhoukangcn@outlook.com>
答复: "dev@kylin.apache.org" <dev@kylin.apache.org>
日期: 2020年1月9日 星期四 上午10:34
收件人: "dev@kylin.apache.org" <dev@kylin.apache.org>, Yaqian Zhang <Yaqian_Zhang@126.com>
主题: Re: [DISCUSS] Cost-benefit of HBase scan result compression

Hi, Yaqian Zhang:

Thanks for your query latency tests.

I retyped the test data for easy reading

kylin sample data
Query Result Size
Compress Time
Query Duration
(Compress)
Query Duration
(Uncompressed)
0.25M
5ms
0.18s
0.23s
0.5M
20ms
0.38s
0.38s
0.7M
25ms
0.52s
0.45s

SSB data
Query Result Size
Compress Time
Query Duration
(Compress)
Query Duration
(Uncompressed)
0.25M
4ms
0.12s
0.15s
0.5M
7ms
0.25s
0.24s
0.7M
10ms
0.35s
0.35s
1M
13ms
0.41s
0.39s
5M
63ms
2.26s
2.27s
10M
135ms
5.10s
4.90s
16M
215ms
7.89s
7.60s


发件人: Yaqian Zhang <Yaqian_Zhang@126.com<mailto:Yaqian_Zhang@126.com>>
答复: "dev@kylin.apache.org<mailto:dev@kylin.apache.org>" <dev@kylin.apache.org<mailto:dev@kylin.apache.org>>
日期: 2020年1月8日 星期三 下午8:04
收件人: "dev@kylin.apache.org<mailto:dev@kylin.apache.org>" <dev@kylin.apache.org<mailto:dev@kylin.apache.org>>
主题: Re: [DISCUSS] Cost-benefit of HBase scan result compression

Hi:

I have tested the  query time latency in In both cases.

In our CDH cluster environment, I get the following experimental results.

kylin sample data
Query Result Size
Compress Time
Query Duration(Compress)
Query Duration(Uncompressed)
0.25M
5ms
0.18s
0.23s
0.5M
20ms
0.38s
0.38s
0.7M
25ms
0.52s
0.45s

SSB data
Query Result Size
Compress Time
Query Duration(Compress)
Query Duration(Uncompressed)
0.25M
4ms
0.12s
0.15s
0.5M
7ms
0.25s
0.24s
0.7M
10ms
0.35s
0.35s
1M
13ms
0.41s
0.39s
5M
63ms
2.26s
2.27s
10M
135ms
5.10s
4.90s
16M
215ms
7.89s
7.60s

Conclusion:
Enable compression will improve query speed when result size<0.5M.
Turning on compression will reduce query speed in general when result size>1M.

So,it is recommended to set the default value of kylin.storage.hbase.endpoint-compress-result
to false.


在 2020年1月4日,19:35,Yaqian Zhang <Yaqian_Zhang@126.com<mailto:Yaqian_Zhang@126.com><mailto:Yaqian_Zhang@126.com><mailto:Yaqian_Zhang@126.com%3e>>
写道:
HI Kang:
Thank you for your compare and report!
I will test and verify the query time latency for this!
在 2020年1月3日,10:32,Zhou Kang <zhoukangcn@outlook.com<mailto:zhoukangcn@outlook.com><mailto:zhoukangcn@outlook.com><mailto:zhoukangcn@outlook.com%3e>>
写道:
Hi,all
kylin.storage.hbase.endpoint-compress-result is TRUE as default.
In Xiaomi Group, we found compression will cause query time latency up to 30 sec and more.
After we analyze log in HBase, we found compression is useless in most situations.
Detail info you can see in : https://issues.apache.org/jira/browse/KYLIN-4322
And more, in our environment,
1.     Only 0.05% data is bigger than 1M
2.     Almost 70% compression data is larger than source data.
So, should we set this config FALSE as default.
And, kylin.storage.hbase.endpoint-compress-result should be override in cube or project, which
is forbidden in CubeVisitService:visitCube now.



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message