asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Glenn Justo Galvizo (Jira)" <>
Subject [jira] [Assigned] (ASTERIXDB-2899) Accelerate Jaccard Similarity Queries w/ Array Indexes
Date Fri, 25 Jun 2021 22:18:00 GMT


Glenn Justo Galvizo reassigned ASTERIXDB-2899:

    Assignee: Glenn Justo Galvizo

> Accelerate Jaccard Similarity Queries w/ Array Indexes
> ------------------------------------------------------
>                 Key: ASTERIXDB-2899
>                 URL:
>             Project: Apache AsterixDB
>          Issue Type: New Feature
>            Reporter: Glenn Justo Galvizo
>            Assignee: Glenn Justo Galvizo
>            Priority: Major
> Given the following:
> {code:java}
> CREATE INDEX storesCatIdx ON Stores (UNNEST categories);
> FROM    Stores S
> WHERE   SIMILARITY_JACCARD_CHECK(S.categories, ["Fruits", "Bread"], 0.6)
> SELECT  *;
> FROM    Stores S
> WHERE   SIMILARITY_JACCARD(S.categories, ["Fruits", "Bread"]) > 0.6
> SELECT  *;{code}
> The index Stores.storesCatIdx can be used to accelerate the aforementioned queries. A
rule can be introduced to transform the query into a join query on the  ["Fruits", "Bread"]
array and the S.categories array. The rule to optimize for array index joins will then fire,
utilizing the primary index validation to remove false positives. 
> The resulting plan is one that generates false positives from the multi-valued index
search (store categories that have one of the items) that will be filtered out by applying
the actual jaccard similarity function + check before yielding the results back to the user.

This message was sent by Atlassian Jira

View raw message