phoenix-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-5157) Why is a Row timestamp column table slower than a regular table?
Date Mon, 04 Mar 2019 04:53:00 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782960#comment-16782960
] 

Lars Hofhansl commented on PHOENIX-5157:
----------------------------------------

I looked at the code and beyond setting a timerange on the generated HBase Scan object I can
see no difference - at first glance at least.
Without some actual test data or perhaps a test for this we're not likely to make progress.

> Why is a Row timestamp column table slower than a regular table?
> ----------------------------------------------------------------
>
>                 Key: PHOENIX-5157
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5157
>             Project: Phoenix
>          Issue Type: Test
>    Affects Versions: 4.14.0
>         Environment: ||Sortware||Version||
> |Phoenix|4.14.0-cdh5.11.2|
> |Hbase|1.2.0-cdh5.11.1|
> |Java|1.8.0_121|
> |Region Servers Number|4|
>            Reporter: wxmimperio
>            Priority: Major
>              Labels: test
>
> create regular table like this:
> {code:java}
> CREATE TABLE phoenix_dev.test_table1 (
> 	event_time TIMESTAMP NOT NULL /*event_time*/,
> 	"_KEY" BIGINT NOT NULL /*_key*/,
> 	col1 BIGINT /*col1*/,
> 	col2 BIGINT /*col2*/,
> 	col3 BIGINT /*col3*/,
> 	col4 BIGINT /*col4*/,
> 	col5 VARCHAR /*col5*/,
> 	CONSTRAINT pk PRIMARY KEY (event_time ASC, "_KEY" ASC)
> ) COMPRESSION = SNAPPY,
>  SALT_BUCKETS = 4,
>  VERSIONS = 1
> {code}
> create Row timestamp table like this:
> {code:java}
> CREATE TABLE phoenix_dev.test_table2 (
> 	event_time TIMESTAMP NOT NULL /*event_time*/,
> 	"_KEY" BIGINT NOT NULL /*_key*/,
> 	col1 BIGINT /*col1*/,
> 	col2 BIGINT /*col2*/,
> 	col3 BIGINT /*col3*/,
> 	col4 BIGINT /*col4*/,
> 	col5 VARCHAR /*col5*/,
> 	CONSTRAINT pk PRIMARY KEY (event_time ASC ROW_TIMESTAMP, "_KEY" ASC)
> ) COMPRESSION = SNAPPY,
>  SALT_BUCKETS = 4,
>  VERSIONS = 1
> {code}
> Except for `event_time ASC ROW_TIMESTAMP`, everything else is identical and Insert 2000w
the same data.
> h2. *{color:#ff0000}Test SQL1:{color}*
> *Regular*
> {code:java}
> explain select count(*) from PHOENIX_DEV.test_table1;
> +-------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
> |                                                       PLAN                        
                               | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +-------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
> | CLIENT 72-CHUNK 18453404 ROWS 20132661420 BYTES PARALLEL 4-WAY FULL SCAN OVER PHOENIX_DEV:TEST_TABLE1
 | 20132661420     | 18453404       | 1550646317452  |
> |     SERVER FILTER BY FIRST KEY ONLY                                               
                               | 20132661420     | 18453404       | 1550646317452  |
> |     SERVER AGGREGATE INTO SINGLE ROW                                              
                               | 20132661420     | 18453404       | 1550646317452  |
> +-------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
> select count(*) from PHOENIX_DEV.test_table1; 
> +-----------+
> | COUNT(1) |
> +-----------+
> | 20000000 |
> +-----------+
> {code}
> {color:#d04437}cost:5.267 seconds{color}
> *ROW_TIMESTAMP:*
> {code:java}
> explain select count(*) from PHOENIX_DEV.test_table2;
> +--------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
> |                                                              PLAN                 
                                            | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS
  |
> +--------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
> | CLIENT 58-CHUNK 14416723 ROWS 15728641792 BYTES PARALLEL 4-WAY FULL SCAN OVER PHOENIX_DEV:TEST_TABLE2
 | 15728641792     | 14416723       | 1550648443256  |
> |     ROW TIMESTAMP FILTER [0, 9223372036854775807)                                 
                                            | 15728641792     | 14416723       | 1550648443256
 |
> |     SERVER FILTER BY FIRST KEY ONLY                                               
                                            | 15728641792     | 14416723       | 1550648443256
 |
> |     SERVER AGGREGATE INTO SINGLE ROW                                              
                                            | 15728641792     | 14416723       | 1550648443256
 |
> +--------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
> select count(*) from PHOENIX_DEV.test_table2; 
> +-----------+
> | COUNT(1) |
> +-----------+
> | 20000000 |
> +-----------+
> {code}
> {color:#d04437}cost:20.896 seconds{color}
> h2. Test SQL2:
> *Regular*
> {code:java}
> explain select count(*),col1,CAST(floor(event_time, 'MINUTE', 1) AS timestamp) AS log_time
from PHOENIX_DEV.TEST1 where event_time >= to_timestamp('2018-10-21 00:00:00') AND event_time
<= to_timestamp('2018-11-21 00:00:00') and col2 = 'ddd' AND col1 = 'fff' group by col1,log_time
order by log_time desc limit 5000;
> +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
> | PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
> +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
> | CLIENT 48-CHUNK 12686716 ROWS 13841204812 BYTES PARALLEL 48-WAY REVERSE RANGE SCAN
OVER PHOENIX_DEV:TEST1 [0,'2018-10-20 16:00:00.000'] - [3,'2018-11-20 16:00:00.000'] | 13841204812
| 12686716 | 1550646317452 |
> | SERVER FILTER BY (COL2 = 'ddd' AND COL1 = 'fff') | 13841204812 | 12686716 | 1550646317452
|
> | SERVER AGGREGATE INTO DISTINCT ROWS BY [TO_TIMESTAMP(FLOOR(TO_DATE(EVENT_TIME))), COL1]
| 13841204812 | 12686716 | 1550646317452 |
> | CLIENT MERGE SORT | 13841204812 | 12686716 | 1550646317452 |
> | CLIENT 5000 ROW LIMIT | 13841204812 | 12686716 | 1550646317452 |
> +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
> select count(*),col1,CAST(floor(event_time, 'MINUTE', 1) AS timestamp) AS log_time from
PHOENIX_DEV.TEST1 where event_time >= to_timestamp('2018-10-21 00:00:00') AND event_time
<= to_timestamp('2018-11-21 00:00:00') and col2 = 'ddd' AND col1 = 'fff' group by col1,log_time
order by log_time desc limit 5000;
> {code}
> {color:#d04437}cost:5.714 seconds{color}
> *ROW_TIMESTAMP:*
> {code:java}
> explain select count(*),col1,CAST(floor(event_time, 'MINUTE', 1) AS timestamp) AS log_time
from PHOENIX_DEV.TEST2 where event_time >= to_timestamp('2018-10-21 00:00:00') AND event_time
<= to_timestamp('2018-11-21 00:00:00') and col2 = 'ddd' AND col1 = 'fff' group by col1,log_time
order by log_time desc limit 5000;
> +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
> | PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
> +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
> | CLIENT 48-CHUNK 12686716 ROWS 13841204812 BYTES PARALLEL 48-WAY REVERSE RANGE SCAN
OVER PHOENIX_DEV:TEST2 [0,'2018-10-20 16:00:00.000'] - [3,'2018-11-20 16:00:00.000'] | 13841204812
| 12686716 | 1550648443256 |
> | ROW TIMESTAMP FILTER [1540051200000, 1542729600001) | 13841204812 | 12686716 | 1550648443256
|
> | SERVER FILTER BY (col2 = 'ddd' AND col1 = 'fff') | 13841204812 | 12686716 | 1550648443256
|
> | SERVER AGGREGATE INTO DISTINCT ROWS BY [TO_TIMESTAMP(FLOOR(TO_DATE(EVENT_TIME))), col1]
| 13841204812 | 12686716 | 1550648443256 |
> | CLIENT MERGE SORT | 13841204812 | 12686716 | 1550648443256 |
> | CLIENT 5000 ROW LIMIT | 13841204812 | 12686716 | 1550648443256 |
> +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
> select count(*),col1,CAST(floor(event_time, 'MINUTE', 1) AS timestamp) AS log_time from
PHOENIX_DEV.TEST2 where event_time >= to_timestamp('2018-10-21 00:00:00') AND event_time
<= to_timestamp('2018-11-21 00:00:00') and col2 = 'ddd' AND col1 = 'fff' group by col1,log_time
order by log_time desc limit 5000;
> {code}
> {color:#d04437}cost:19.391 seconds{color}
> {color:#ff0000}According to the row timestamp document:[http://phoenix.apache.org/rowtimestamp.html]{color}
> {color:#cccccc}With the help of this time range information, HBase on the server side
is able to entirely skip those store files which do not have fall in the time range. This
greatly improves performance especially when querying the tail-end of the data.{color}
> h1. *Why is the query slower after adding row timestamp?*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message