Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 516A6200D69 for ; Wed, 27 Dec 2017 19:33:06 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 4FAE9160C10; Wed, 27 Dec 2017 18:33:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 46EE0160C23 for ; Wed, 27 Dec 2017 19:33:05 +0100 (CET) Received: (qmail 773 invoked by uid 500); 27 Dec 2017 18:33:04 -0000 Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.apache.org Delivered-To: mailing list dev@phoenix.apache.org Received: (qmail 761 invoked by uid 99); 27 Dec 2017 18:33:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Dec 2017 18:33:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id DC7DB1A0D16 for ; Wed, 27 Dec 2017 18:33:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_NONE=-0.0001, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id kZaojnp51CNq for ; Wed, 27 Dec 2017 18:33:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 249125F3A1 for ; Wed, 27 Dec 2017 18:33:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 4FCBCE04F4 for ; Wed, 27 Dec 2017 18:33:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 0C17E240DA for ; Wed, 27 Dec 2017 18:33:00 +0000 (UTC) Date: Wed, 27 Dec 2017 18:33:00 +0000 (UTC) From: "Sokolov Yura (JIRA)" To: dev@phoenix.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (PHOENIX-4504) Subquery with ORDER BY on salted table gives wrong results MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 27 Dec 2017 18:33:06 -0000 Sokolov Yura created PHOENIX-4504: ------------------------------------- Summary: Subquery with ORDER BY on salted table gives wrong results Key: PHOENIX-4504 URL: https://issues.apache.org/jira/browse/PHOENIX-4504 Project: Phoenix Issue Type: Bug Affects Versions: 4.11.0 Environment: amazon emr phoenix 4.11.0 hbase 1.3 Reporter: Sokolov Yura Probably it is already fixed. Having a quick search I didn't find exact problem description. I have a table: {code:sql} create immutable table product_history_v3 ( ts bigint not null, id varchar not null, product varchar, merchantid varchar, storeid varchar, constraint pk primary key (ts, id) ) compression=LZ4,max_filesize=150000000,memstore_flushsize=70000000, versions=1,update_cache_frequency=1000,append_only_schema=true, guid_posts_width=10000000, SALT_BUCKETS=20; create local index product_history_v3_id_ts on product_history_v3 (id, ts) compression=LZ4; create local index product_history_v3_merchantid_ts on product_history_v3 (merchantid, ts) include (id) compression=LZ4; create local index product_history_v3_storeid_ts on product_history_v3 (storeid, ts) include (id) compression=LZ4; {code} Simple select by merchanid ordering by id,ts returns correct results: {code:sql} 0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts from product_history_v3 where merchantid = '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > 1498867200000 order by id, ts limit 30; +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+-+ | PLAN | EST_BYTES_READ | | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+-+ | CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER PRODUCT_HISTORY_V3 [2,'1479114284851799852-2-11-118-1577502676',1498867200001] - [2,'1479114284851799852-2-11-118-1577502676',1499472000000] | null | | | SERVER FILTER BY FIRST KEY ONLY | null | | | SERVER TOP 30 ROWS SORTED BY ["ID", "TS"] | null | | | CLIENT MERGE SORT | null | | | CLIENT LIMIT 30 | null | | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+-+ 5 rows selected (0,019 seconds) {code} It runs very fast until I add {{product}} to selected fields (cause average length of {{product}} is 10kb). So I'm trying to fetch id,ts in subquery, and product in outer query. It runs fast, but returns incorrect results: set of rows doesn't match to set of rows returned by query above. {code} 0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts, substr(product,1,30) from product_history_v3 where (id, ts) in (select id, ts from product_history_v3 where merchantid = '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > 1498867200000 order by id, ts limit 30) order by id, ts limit 30; +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+ | PLAN | EST_BYTES | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+ | CLIENT 40-CHUNK 915204 ROWS 6291521905 BYTES PARALLEL 40-WAY FULL SCAN OVER PRODUCT_HISTORY_V3 | 0 | | SERVER TOP 30 ROWS SORTED BY [PRODUCT_HISTORY_V3.ID, PRODUCT_HISTORY_V3.TS] | 0 | | CLIENT MERGE SORT | 0 | | CLIENT LIMIT 30 | 0 | | SKIP-SCAN-JOIN TABLE 0 | 0 | | CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER PRODUCT_HISTORY_V3 [2,'1479114284851799852-2-11-118-1577502676',1498867200001] - [2,'1479114284851799852-2-11-118-1577502676',1499472000000] | 0 | | SERVER FILTER BY FIRST KEY ONLY | 0 | | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID", "TS"] LIMIT 30 GROUPS | 0 | | CLIENT MERGE SORT | 0 | | CLIENT 30 ROW LIMIT | 0 | | DYNAMIC SERVER FILTER BY (PRODUCT_HISTORY_V3.TS, PRODUCT_HISTORY_V3.ID) IN (($470.$473, $470.$472)) | 0 | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+ 11 rows selected (0,021 seconds) {code} However, if I change ordering a bit, so planner is forced for reordering, then set of rows is equal to original query: {code} 0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts, substr(product,1,30) from product_history_v3 where (id, ts) in (select id, ts from product_history_v3 where merchantid = '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > 1498867200000 order by id||'-', ts limit 3000) order by id, ts limit 30 offset 2970; +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+ | PLAN | EST_BYTES | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+ | CLIENT 40-CHUNK 915204 ROWS 6291521905 BYTES PARALLEL 40-WAY FULL SCAN OVER PRODUCT_HISTORY_V3 | 0 | | SERVER TOP 3000 ROWS SORTED BY [PRODUCT_HISTORY_V3.ID, PRODUCT_HISTORY_V3.TS] | 0 | | CLIENT MERGE SORT | 0 | | CLIENT OFFSET 2970 | 0 | | CLIENT LIMIT 30 | 0 | | SKIP-SCAN-JOIN TABLE 0 | 0 | | CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER PRODUCT_HISTORY_V3 [2,'1479114284851799852-2-11-118-1577502676',1498867200001] - [2,'1479114284851799852-2-11-118-1577502676',1499472000000] | 0 | | SERVER FILTER BY FIRST KEY ONLY | 0 | | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID", "TS"] | 0 | | CLIENT MERGE SORT | 0 | | CLIENT TOP 3000 ROWS SORTED BY [("ID" || '-'), "TS"] | 0 | | DYNAMIC SERVER FILTER BY (PRODUCT_HISTORY_V3.TS, PRODUCT_HISTORY_V3.ID) IN (($482.$485, $482.$484)) | 0 | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+ 12 rows selected (0,021 seconds) {code} There, certainly, should be a lot of rows to trigger this behaviour. -- This message was sent by Atlassian JIRA (v6.4.14#64029)