phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sokolov Yura (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PHOENIX-4504) Subquery with ORDER BY on salted table gives wrong results
Date Wed, 27 Dec 2017 18:33:00 GMT
Sokolov Yura created PHOENIX-4504:
-------------------------------------

             Summary: Subquery with ORDER BY on salted table gives wrong results
                 Key: PHOENIX-4504
                 URL: https://issues.apache.org/jira/browse/PHOENIX-4504
             Project: Phoenix
          Issue Type: Bug
    Affects Versions: 4.11.0
         Environment: amazon emr phoenix 4.11.0 hbase 1.3
            Reporter: Sokolov Yura


Probably it is already fixed. Having a quick search I didn't find exact problem description.

I have a table:
{code:sql}
create immutable table product_history_v3 (
        ts bigint not null,
        id varchar not null,
        product varchar,
        merchantid varchar,
        storeid varchar,
        constraint pk primary key (ts, id)
) compression=LZ4,max_filesize=150000000,memstore_flushsize=70000000,
        versions=1,update_cache_frequency=1000,append_only_schema=true,
        guid_posts_width=10000000,
SALT_BUCKETS=20;
create local index product_history_v3_id_ts on product_history_v3 (id, ts) compression=LZ4;
create local index product_history_v3_merchantid_ts on product_history_v3 (merchantid, ts)
include (id) compression=LZ4;
create local index product_history_v3_storeid_ts on product_history_v3 (storeid, ts) include
(id) compression=LZ4;
{code}
Simple select by merchanid ordering by id,ts returns correct results:
{code:sql}
0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts from product_history_v3 where
merchantid = '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts >
1498867200000 order by id, ts limit 30;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+-+
|                                                                                        
            PLAN                                                                         
                           | EST_BYTES_READ  | |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+-+
| CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER PRODUCT_HISTORY_V3 [2,'1479114284851799852-2-11-118-1577502676',1498867200001]
- [2,'1479114284851799852-2-11-118-1577502676',1499472000000]  | null            | |
|     SERVER FILTER BY FIRST KEY ONLY                                                    
                                                                                         
                           | null            | |
|     SERVER TOP 30 ROWS SORTED BY ["ID", "TS"]                                          
                                                                                         
                         | null            | |
| CLIENT MERGE SORT                                                                      
                                                                                         
                           | null            | |
| CLIENT LIMIT 30                                                                        
                                                                                         
                        | null            | |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+-+
5 rows selected (0,019 seconds)
{code}
It runs very fast until I add {{product}} to selected fields (cause average length of {{product}}
is 10kb).

So I'm trying to fetch id,ts in subquery, and product in outer query. It runs fast, but returns
incorrect results: set of rows doesn't match to set of rows returned by query above.
{code}
0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts, substr(product,1,30) from
product_history_v3 where (id, ts) in (select id, ts from product_history_v3 where merchantid
= '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > 1498867200000
order by id, ts limit 30) order by id, ts limit 30;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
|                                                                                        
                PLAN                                                                     
                                   | EST_BYTES |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| CLIENT 40-CHUNK 915204 ROWS 6291521905 BYTES PARALLEL 40-WAY FULL SCAN OVER PRODUCT_HISTORY_V3
                                                                                         
                            | 0         |
|     SERVER TOP 30 ROWS SORTED BY [PRODUCT_HISTORY_V3.ID, PRODUCT_HISTORY_V3.TS]        
                                                                                         
                                   | 0         |
| CLIENT MERGE SORT                                                                      
                                                                                         
                                   | 0         |
| CLIENT LIMIT 30                                                                        
                                                                                         
                                   | 0         |
|     SKIP-SCAN-JOIN TABLE 0                                                             
                                                                                         
                                   | 0         |
|         CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER PRODUCT_HISTORY_V3
[2,'1479114284851799852-2-11-118-1577502676',1498867200001] - [2,'1479114284851799852-2-11-118-1577502676',1499472000000]
 | 0         |
|             SERVER FILTER BY FIRST KEY ONLY                                            
                                                                                         
                                   | 0         |
|             SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID", "TS"] LIMIT 30 GROUPS
                                                                                         
                                   | 0         |
|         CLIENT MERGE SORT                                                              
                                                                                         
                                   | 0         |
|         CLIENT 30 ROW LIMIT                                                            
                                                                                         
                                   | 0         |
|     DYNAMIC SERVER FILTER BY (PRODUCT_HISTORY_V3.TS, PRODUCT_HISTORY_V3.ID) IN (($470.$473,
$470.$472))                                                                              
                               | 0         |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
11 rows selected (0,021 seconds)
{code}
However, if I change ordering a bit, so planner is forced for reordering, then set of rows
is equal to original query:
{code}
0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts, substr(product,1,30) from
product_history_v3 where (id, ts) in (select id, ts from product_history_v3 where merchantid
= '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > 1498867200000
order by id||'-', ts limit 3000) order by id, ts limit 30 offset 2970;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
|                                                                                        
                PLAN                                                                     
                                   | EST_BYTES |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| CLIENT 40-CHUNK 915204 ROWS 6291521905 BYTES PARALLEL 40-WAY FULL SCAN OVER PRODUCT_HISTORY_V3
                                                                                         
                            | 0         |
|     SERVER TOP 3000 ROWS SORTED BY [PRODUCT_HISTORY_V3.ID, PRODUCT_HISTORY_V3.TS]      
                                                                                         
                                   | 0         |
| CLIENT MERGE SORT                                                                      
                                                                                         
                                   | 0         |
| CLIENT OFFSET 2970                                                                     
                                                                                         
                                   | 0         |
| CLIENT LIMIT 30                                                                        
                                                                                         
                                   | 0         |
|     SKIP-SCAN-JOIN TABLE 0                                                             
                                                                                         
                                   | 0         |
|         CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER PRODUCT_HISTORY_V3
[2,'1479114284851799852-2-11-118-1577502676',1498867200001] - [2,'1479114284851799852-2-11-118-1577502676',1499472000000]
 | 0         |
|             SERVER FILTER BY FIRST KEY ONLY                                            
                                                                                         
                                   | 0         |
|             SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID", "TS"]                
                                                                                         
                                   | 0         |
|         CLIENT MERGE SORT                                                              
                                                                                         
                                   | 0         |
|         CLIENT TOP 3000 ROWS SORTED BY [("ID" || '-'), "TS"]                           
                                                                                         
                                   | 0         |
|     DYNAMIC SERVER FILTER BY (PRODUCT_HISTORY_V3.TS, PRODUCT_HISTORY_V3.ID) IN (($482.$485,
$482.$484))                                                                              
                               | 0         |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
12 rows selected (0,021 seconds)
{code}

There, certainly, should be a lot of rows to trigger this behaviour.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message