Return-Path: X-Original-To: apmail-phoenix-dev-archive@minotaur.apache.org Delivered-To: apmail-phoenix-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1A4B617300 for ; Sun, 1 Mar 2015 01:43:30 +0000 (UTC) Received: (qmail 1195 invoked by uid 500); 1 Mar 2015 01:43:30 -0000 Delivered-To: apmail-phoenix-dev-archive@phoenix.apache.org Received: (qmail 1127 invoked by uid 500); 1 Mar 2015 01:43:30 -0000 Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.apache.org Delivered-To: mailing list dev@phoenix.apache.org Received: (qmail 1115 invoked by uid 99); 1 Mar 2015 01:43:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 Mar 2015 01:43:29 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Sun, 01 Mar 2015 01:43:07 +0000 Received: (qmail 1083 invoked by uid 99); 1 Mar 2015 01:43:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 Mar 2015 01:43:04 +0000 Date: Sun, 1 Mar 2015 01:43:04 +0000 (UTC) From: "James Taylor (JIRA)" To: dev@phoenix.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (PHOENIX-1690) IndexOutOfBoundsException during SkipScanFilter interesect MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/PHOENIX-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor updated PHOENIX-1690: ---------------------------------- Attachment: PHOENIX-1690.patch Patch with tests that fixes issue > IndexOutOfBoundsException during SkipScanFilter interesect > ---------------------------------------------------------- > > Key: PHOENIX-1690 > URL: https://issues.apache.org/jira/browse/PHOENIX-1690 > Project: Phoenix > Issue Type: Bug > Reporter: James Taylor > Attachments: PHOENIX-1690-repro.patch, PHOENIX-1690-wip.patch, PHOENIX-1690.patch > > > The relevant portion of the olap doc schema is: > create table PERF.BIG_OLAP_DOC ( > client_id integer not null > ,customer_id integer > ,time_id integer not null > ,conversion_type_id integer not null > ,device_type varchar(16) > ,keyword_id bigint not null > ,creative_id bigint not null > ,placement_id bigint not null > ,product_target_id bigint not null > ,network varchar(7) > ,impressions decimal(18, 4) > ,publisher_clicks decimal(18, 4) > ,publisher_cost decimal(18, 4) > ,conversions decimal(18, 4) > ,revenue decimal(18, 4) > [ ...additional metric and dimensional colums ... ] > constraint perf_fact_pk primary key (client_id, time_id, conversion_type_id, device_type, keyword_id, creative_id, placement_id, product_target_id))SALT_BUCKETS=10; > I am evaluating a 'stitch' case where results from an external system are injected either via table or (as in this case) an in-list. An example of one of these test agg queries I am using is: > SELECT count(1) cnt, > coalesce(SUM(impressions), 0.0) AS "impressions", > coalesce(SUM(publisher_clicks), 0.0) AS "pub_clicks", > coalesce(SUM(publisher_cost), 0.0) AS "pub_cost", > coalesce(SUM(conversions), 0.0) AS "conversions", > coalesce(SUM(revenue), 0.0) AS "revenue" > FROM perf.big_olap_doc > WHERE time_id between 3000 and 3700 > AND network in ('SEARCH') > AND conversion_type_id = 1 > AND client_id = 10724 > -- AND device_type in ('MOBILE','DESKTOP','OTHER','TABLET') > AND keyword_id in ( > 613214369, 613217307, 613247509, 613248897, 613250382, 613250387, 613252322, 613260252, 613261753, 613261754, 613261759, > 613261770, 613261873, 613261884, 613261885, 613261888, 613261889, 613261892, 613261897, 613261913, 613261919, 613261927, > 614496021, 843606367, 843606967, 843607021, 843607033, 843607089, 1038731600, 1038731672, 1038731673, 1038731675, > 1038731684, 1038731693, 1046990487, 1046990488, 1046990499, 1046990505, 1046990506, 1049724722, 1051109548, 1051311275, > 1051311904, 1060574377, 1060574395, 1060574506, 1060574562, 1115915938, 1115915939, 1115915941, 1116310571, 1367495544, > 1367495545, 1367497297, 1367497298, 1367497299, 1367497300, 1367497303, 1367497313, 1367497813, 1367497816, 1367497818, > 1367497821, 1367497822, 1367497823, 1624976423, 1624976451, 1624976457, 3275636061, 3275640505, 3275645765, 3275645807, > 3275649138, 3275651456, 3275651460, 3275651478, 3275651479, 3275654566, 3275654568, 3275654570, 3275654575, 3275659612, > 3275659616, 3275659620, 3275668880, 3275669693, 3275675627, 3275675634, 3275677479, 3275677504, 3275678855, 3275679524, > 3275679532, 3275680014, 3275682307, 3275682308, 3275682309, 3275682310, 3275682420, 3275682423, 3275682436, 3275682448, > 3275682460, 3275682462, 3275682474, 3275684831, 3275688903, 3275694023, 3275694025, 3275694027, 3275695054, 3275695056, > 3275695062, 3275699512, 3275699514, 3275699518, 3275701682, 3275701683, 3275701685, 3275701688, 3275703633, 3275703634, > 3275703635, 3275703636, 3275703638, 3275703639, 3275704860, 3275704861, 3275764577, 3275797149, 3275798566, 3275798567, > 3275798568, 3275798592, 3275931147, 3275942728, 3275945337, 3275945338, 3275945339, 3275945340, 3275945342, 3275945344, > 3275946319, 3275946322, 3275946324, 3275946643, 3275949495, 3275949498, 3275949500, 3275950250, 3275955128, 3275955129, > 3275955130, 3427017435, 3427017450, 3438304254, 3438304257, 3447068169, 3505227849, 3505227890, 3505556908, 3506351285, > 3506351389, 3506351398, 3506351468, 3510037138, 3510038610, 3545590644, 3545594378, 3545595073, 3545595318, 3545595506, > 3545597841, 3545598818, 3545599658, 3545599663, 3545601215, 3556080898, 3556080980, 3556080999, 3556081323, 3565122663, > 3565122679, 3565122801, 3565122858, 3565122908, 3565122929, 3565122952, 3565122984, 3565123028, 3565123047, 3565123048, > 3565123203, 3565123230, 3949988054, 3949988056, 3949988070, 3972992248, 3972992252, 3972992254, 3972992257, 3972992263, > 3972992267, 3972992268, 3972992269, 3972992270, 3972992274, 3972992275, 3972992277, 3972992281, 3972992293, 3972992298, > 3972992299, 3972992305, 3972992307, 3972992313, 3972992316, 3972992322, 3972992338, 3978471261, 3978471272, 4266318185, > 4298107404, 4308853119, 4308853123, 4308853500, 4451174646, 4451174656, 4451174701, 4569827278, 4569827284, 4569827287, > 4569827379, 4569827523, 4569827524, 4896589676, 4979049725, 5054587609, 5136433884, 5362640372, 5393109964, 5393405364, > 5393405365, 5393405620, 5393405625, 5393405675, 5393405677, 5393405858, 5393405970) > Reading your interpretation of the skip scan, I see that the plan is indicating it is only using the salt and the first three columns of the index, client_id, and time_id and conversion_type. I hadn't considered the salt - that bit of detail in the plan makes more sense to me now. It looks now like the lackluster performance for higher cardinality aggregations is related to scanning a much larger portion of the key space. For aggregations where I am not relying on filtering, I am seeing much better performance. > So to tune this particular stitch case / skip scan, it looks like I need to get the 4th index column into the criteria. There are only four distinct values in the fourth index column (these can/should probably be something other than varchar, but this is what I have loaded currently). In order to use the keyword_id portion of the index I tried explicitly specifying all device_types via in-list (the commented portion of the query above), but I get a peculiar error: > java.lang.IndexOutOfBoundsException: end index (1) must not be less than start index (2) > at com.google.common.base.Preconditions.checkPositionIndexes(Preconditions.java:388) > at com.google.common.collect.ImmutableList.subList(ImmutableList.java:362) > at com.google.common.collect.ImmutableList.subList(ImmutableList.java:62) > at org.apache.phoenix.filter.SkipScanFilter.intersect(SkipScanFilter.java:291) > at org.apache.phoenix.filter.SkipScanFilter.intersect(SkipScanFilter.java:177) > at org.apache.phoenix.compile.ScanRanges.intersectScan(ScanRanges.java:316) > at org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:464) > at org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:394) > at org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:184) > at org.apache.phoenix.iterate.ParallelIterators.(ParallelIterators.java:54) > at org.apache.phoenix.execute.AggregatePlan.newIterator(AggregatePlan.java:173) > at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:227) > at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:154) > at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:226) > at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:217) > at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) > at org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:216) > at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1057) > at sqlline.SqlLine$Commands.execute(SqlLine.java:3673) > at sqlline.SqlLine$Commands.sql(SqlLine.java:3584) > at sqlline.SqlLine.dispatch(SqlLine.java:821) > at sqlline.SqlLine.begin(SqlLine.java:699) > at sqlline.SqlLine.mainWithInputRedirection(SqlLine.java:441) > at sqlline.SqlLine.main(SqlLine.java:424) > I thought perhaps I was hitting an upper limit on the number of elements in an in-list for a skip scan, and so tried removing the 250 element keyword in-list entirely and leaving only the device_type in-list, but I still get the same error. It happens immediately, even for an explain, so I presume this is a query parsing problem. Is there a bug or limitation of skip scans and/or sub lists involving varchar? -- This message was sent by Atlassian JIRA (v6.3.4#6332)