From notifications-return-91891-archive-asf-public=cust-asf.ponee.io@asterixdb.apache.org Thu Mar 15 00:47:04 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id DDD51180654 for ; Thu, 15 Mar 2018 00:47:03 +0100 (CET) Received: (qmail 39494 invoked by uid 500); 14 Mar 2018 23:47:03 -0000 Mailing-List: contact notifications-help@asterixdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@asterixdb.apache.org Delivered-To: mailing list notifications@asterixdb.apache.org Received: (qmail 39485 invoked by uid 99); 14 Mar 2018 23:47:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Mar 2018 23:47:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 8CA10C0430 for ; Wed, 14 Mar 2018 23:47:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.511 X-Spam-Level: X-Spam-Status: No, score=-109.511 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id MhurnEOPimr5 for ; Wed, 14 Mar 2018 23:47:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 0FFC75F238 for ; Wed, 14 Mar 2018 23:47:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 8E8ADE0114 for ; Wed, 14 Mar 2018 23:47:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 4FE5D21490 for ; Wed, 14 Mar 2018 23:47:00 +0000 (UTC) Date: Wed, 14 Mar 2018 23:47:00 +0000 (UTC) From: "Till (JIRA)" To: notifications@asterixdb.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (ASTERIXDB-2215) Filter is not properly applied for a secondary inverted index search MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ASTERIXDB-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till updated ASTERIXDB-2215: ---------------------------- Component/s: IDX - Indexes COMP - Compiler > Filter is not properly applied for a secondary inverted index search > -------------------------------------------------------------------- > > Key: ASTERIXDB-2215 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2215 > Project: Apache AsterixDB > Issue Type: Bug > Components: COMP - Compiler, IDX - Indexes > Reporter: Taewoo Kim > Priority: Major > Labels: triaged > > Based on the way of writing predicate conditions on a field with filter, the generated plan does not correctly show min and max value of a filter. > {code} > drop dataverse twitter if exists; > create dataverse twitter if not exists; > use dataverse twitter; > create type typeUser if not exists as open { > id: int64, > name: string, > screen_name : string, > profile_image_url : string, > lang : string, > location: string, > create_at: date, > description: string, > followers_count: int32, > friends_count: int32, > statues_count: int64 > }; > create type typePlace if not exists as open{ > country : string, > country_code : string, > full_name : string, > id : string, > name : string, > place_type : string, > bounding_box : rectangle > }; > create type typeGeoTag if not exists as open { > stateID: int32, > stateName: string, > countyID: int32, > countyName: string, > cityID: int32?, > cityName: string? > }; > create type typeTweet if not exists as open { > create_at : datetime, > id: int64, > "text": string, > in_reply_to_status : int64, > in_reply_to_user : int64, > favorite_count : int64, > coordinate: point?, > retweet_count : int64, > lang : string, > is_retweet: boolean, > hashtags : {{ string }} ?, > user_mentions : {{ int64 }} ? , > user : typeUser, > place : typePlace?, > geo_tag: typeGeoTag > }; > create dataset ds_tweet(typeTweet) if not exists primary key id with filter on create_at; > {code} > For the following query, the logical plan shows empty min[] and two variables in max[] when doing an inverted-index search. > {code} > USE twitter; > SELECT spatial_cell(get_points(place.bounding_box)[0], create_point(0.0,0.0),1.0,1.0) AS cell, count(*) AS cnt FROM ds_tweet > WHERE ftcontains(text, ['trump'], {'mode':'any'}) AND place.bounding_box IS NOT unknown > AND datetime('2017-02-25T00:00:00') <= create_at AND create_at < datetime('2017-02-26T00:00:00') > GROUP BY cell; > {code} > Exact predicates on the filter > {code} > datetime('2017-02-25T00:00:00') <= create_at AND create_at < datetime('2017-02-26T00:00:00') > {code} > {code} > unnest-map [$$64, $$69, $$70] <- index-search("text_idx", 2, "twitter", "ds_tweet", FALSE, FALSE, 5, null, 21, TRUE, 1, $$63) with filter on min:[] max:[$$67, $$68] > -- SINGLE_PARTITION_INVERTED_INDEX_SEARCH |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > assign [$$67, $$68, $$63] <- [datetime: { 2017-02-26T00:00:00.000Z }, datetime: { 2017-02-25T00:00:00.000Z }, array: [ "trump" ]] > -- ASSIGN |PARTITIONED| > empty-tuple-source > -- EMPTY_TUPLE_SOURCE |PARTITIONED| > {code} > However, for the following query(just switched the location of datetime and create_at at the end of the predicates), it shows another incorrect plan. > {code} > SELECT spatial_cell(get_points(place.bounding_box)[0], create_point(0.0,0.0),1.0,1.0) AS cell, count(*) AS cnt FROM ds_tweet > WHERE ftcontains(text, ['trump'], {'mode':'any'}) AND place.bounding_box IS NOT unknown > AND datetime('2017-02-25T00:00:00') <= create_at AND datetime('2017-02-26T00:00:00') > create_at > GROUP BY cell; > {code} > Exact predicates on the filter: > {code} > datetime('2017-02-25T00:00:00') <= create_at AND datetime('2017-02-26T00:00:00') > create_at > {code} > {code} > unnest-map [$$64, $$69, $$70] <- index-search("text_idx", 2, "twitter", "ds_tweet", FALSE, FALSE, 5, null, 21, TRUE, 1, $$63) with filter on min:[$$67] max:[$$68] > -- SINGLE_PARTITION_INVERTED_INDEX_SEARCH |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > assign [$$67, $$68, $$63] <- [datetime: { 2017-02-26T00:00:00.000Z }, datetime: { 2017-02-25T00:00:00.000Z }, array: [ "trump" ]] > -- ASSIGN |PARTITIONED| > empty-tuple-source > -- EMPTY_TUPLE_SOURCE |PARTITIONED| > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)