asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Glenn Justo Galvizo (Jira)" <j...@apache.org>
Subject [jira] [Assigned] (ASTERIXDB-2899) Accelerate Jaccard Similarity Queries w/ Array Indexes
Date Fri, 25 Jun 2021 22:18:00 GMT

     [ https://issues.apache.org/jira/browse/ASTERIXDB-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Glenn Justo Galvizo reassigned ASTERIXDB-2899:
----------------------------------------------

    Assignee: Glenn Justo Galvizo

> Accelerate Jaccard Similarity Queries w/ Array Indexes
> ------------------------------------------------------
>
>                 Key: ASTERIXDB-2899
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2899
>             Project: Apache AsterixDB
>          Issue Type: New Feature
>            Reporter: Glenn Justo Galvizo
>            Assignee: Glenn Justo Galvizo
>            Priority: Major
>
> Given the following:
>  
> {code:java}
> CREATE INDEX storesCatIdx ON Stores (UNNEST categories);
> FROM    Stores S
> WHERE   SIMILARITY_JACCARD_CHECK(S.categories, ["Fruits", "Bread"], 0.6)
> SELECT  *;
> FROM    Stores S
> WHERE   SIMILARITY_JACCARD(S.categories, ["Fruits", "Bread"]) > 0.6
> SELECT  *;{code}
> The index Stores.storesCatIdx can be used to accelerate the aforementioned queries. A
rule can be introduced to transform the query into a join query on the  ["Fruits", "Bread"]
array and the S.categories array. The rule to optimize for array index joins will then fire,
utilizing the primary index validation to remove false positives. 
> The resulting plan is one that generates false positives from the multi-valued index
search (store categories that have one of the items) that will be filtered out by applying
the actual jaccard similarity function + check before yielding the results back to the user.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message