Return-Path: X-Original-To: apmail-asterixdb-dev-archive@minotaur.apache.org Delivered-To: apmail-asterixdb-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CFF0018CDD for ; Thu, 13 Aug 2015 01:19:22 +0000 (UTC) Received: (qmail 76669 invoked by uid 500); 13 Aug 2015 01:19:22 -0000 Delivered-To: apmail-asterixdb-dev-archive@asterixdb.apache.org Received: (qmail 76601 invoked by uid 500); 13 Aug 2015 01:19:22 -0000 Mailing-List: contact dev-help@asterixdb.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@asterixdb.incubator.apache.org Delivered-To: mailing list dev@asterixdb.incubator.apache.org Received: (qmail 76590 invoked by uid 99); 13 Aug 2015 01:19:22 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Aug 2015 01:19:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 028151A9EBE for ; Thu, 13 Aug 2015 01:19:22 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.501 X-Spam-Level: *** X-Spam-Status: No, score=3.501 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RDNS_NONE=2.5, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id BtaC9qlAEEMz for ; Thu, 13 Aug 2015 01:19:11 +0000 (UTC) Received: from spamd3-us-west.apache.org (unknown [209.188.14.142]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTP id BB74231F28 for ; Thu, 13 Aug 2015 01:02:12 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 77E291A15BF for ; Thu, 13 Aug 2015 01:02:12 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id c4pwSLCAL7Z7 for ; Thu, 13 Aug 2015 01:01:55 +0000 (UTC) Received: from unhygienix.ics.uci.edu (unhygienix.ics.uci.edu [128.195.14.130]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTP id 64A4742D5F for ; Thu, 13 Aug 2015 00:51:53 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by unhygienix.ics.uci.edu (Postfix) with ESMTP id 87813240F22; Wed, 12 Aug 2015 17:50:09 -0700 (PDT) Date: Wed, 12 Aug 2015 17:50:09 -0700 From: "Chen Li (Code Review)" Message-ID: Reply-To: chenli@gmail.com X-Gerrit-MessageType: newchange Subject: Change in asterixdb[master]: fixing minor issues in docs related to similarity queries Ch... X-Gerrit-Change-Id: Ide23cb7fb33a58bcb2eb4535cf89152518d35a86 X-Gerrit-ChangeURL: X-Gerrit-Commit: 474e4bc520484df8d00070ef37c90af1eeb46310 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.8.4 To: undisclosed-recipients:; Chen Li has uploaded a new change for review. https://asterix-gerrit.ics.uci.edu/351 Change subject: fixing minor issues in docs related to similarity queries Change-Id: Ide23cb7fb33a58bcb2eb4535cf89152518d35a86 ...................................................................... fixing minor issues in docs related to similarity queries Change-Id: Ide23cb7fb33a58bcb2eb4535cf89152518d35a86 --- M asterix-doc/src/site/markdown/aql/functions.md M asterix-doc/src/site/markdown/aql/similarity.md 2 files changed, 25 insertions(+), 21 deletions(-) git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb refs/changes/51/351/1 diff --git a/asterix-doc/src/site/markdown/aql/functions.md b/asterix-doc/src/site/markdown/aql/functions.md index fd00d11..4c2e0c1 100644 --- a/asterix-doc/src/site/markdown/aql/functions.md +++ b/asterix-doc/src/site/markdown/aql/functions.md @@ -198,7 +198,7 @@ * `substring_to_contain` : A target `string` that might be contained. * Return Value: * A `boolean` value, `true` if `string_expression` contains `substring_to_contain`, and `false` otherwise. - * Note: An n-gram index can be utilized for this function. + * Note: An [n-gram index](similarity.html#UsingIndexesToSupportSimilarityQueries) can be utilized for this function. * Example: use dataverse TinySocial; @@ -1109,20 +1109,21 @@ ## Similarity Functions [Back to TOC] ## -AsterixDB supports queries with different similarity functions, including edit distance and Jaccard. +AsterixDB supports queries with different similarity functions, +including [edit distance](http://en.wikipedia.org/wiki/Levenshtein_distance) and [Jaccard](https://en.wikipedia.org/wiki/Jaccard_index). ### edit-distance ### * Syntax: edit-distance(expression1, expression2) - * Returns the [edit distance](http://en.wikipedia.org/wiki/Levenshtein_distance) of `expression1` and `expression2`. + * Returns the edit distance of `expression1` and `expression2`. * Arguments: * `expression1` : A `string` or a homogeneous `OrderedList` of a comparable item type. * `expression2` : The same type as `expression1`. * Return Value: * An `int64` that represents the edit distance between `expression1` and `expression2`. - * Note: An n-gram index can be utilized for this function. + * Note: An [n-gram index](similarity.html#UsingIndexesToSupportSimilarityQueries) can be utilized for this function. * Example: use dataverse TinySocial; @@ -1156,7 +1157,7 @@ * An `OrderedList` with two items: * The first item contains a `boolean` value representing whether `expression1` and `expression2` are similar. * The second item contains an `int64` that represents the edit distance of `expression1` and `expression2` if it is within the threshold, or 0 otherwise. - * Note: An n-gram index can be utilized for this function. + * Note: An [n-gram index](similarity.html#UsingIndexesToSupportSimilarityQueries) can be utilized for this function. * Example: use dataverse TinySocial; @@ -1186,8 +1187,9 @@ * An `OrderedList` with two items: * The first item contains a `boolean` value representing whether `expression1` can contain `expression2`. * The second item contains an `int32` that represents the required edit distance for `expression1` to contain `expression2` if the first item is true. -* Note: An n-gram index can be utilized for this function. +* Note: An [n-gram index](similarity.html#UsingIndexesToSupportSimilarityQueries) can be utilized for this function. * Example: + let $i := edit-distance-contains("happy","hapr",2) return $i; @@ -1209,13 +1211,13 @@ * `list_expression2` : An `UnorderedList` or `OrderedList`. * Return Value: * A `float` that represents the Jaccard similarity of `list_expression1` and `list_expression2`. - * Note: A keyword index can be utilized for this function. + * Note: A [keyword index](similarity.html#UsingIndexesToSupportSimilarityQueries) can be utilized for this function. * Example: use dataverse TinySocial; for $user in dataset('FacebookUsers') - let $sim := similarity-jaccard($user.friend-ids, [1,5,9]) + let $sim := similarity-jaccard($user.friend-ids, [1,5,9,10]) where $sim >= 0.6f return $user @@ -1247,13 +1249,13 @@ * An `OrderedList` with two items: * The first item contains a `boolean` value representing whether `list_expression1` and `list_expression2` are similar. * The second item contains a `float` that represents the Jaccard similarity of `list_expression1` and `list_expression2` if it is greater than or equal to the threshold, or 0 otherwise. - * Note: A keyword index can be utilized for this function. + * Note: A [keyword index](similarity.html#UsingIndexesToSupportSimilarityQueries) can be utilized for this function. * Example: use dataverse TinySocial; for $user in dataset('FacebookUsers') - let $sim := similarity-jaccard-check($user.friend-ids, [1,5,9], 0.6f) + let $sim := similarity-jaccard-check($user.friend-ids, [1,5,9,10], 0.6f) where $sim[0] return $sim[1] @@ -1264,7 +1266,7 @@ 1.0f -### Similarity Operator ~# ### +### Similarity Operator ~= ### * "`~=`" is syntactic sugar for expressing a similarity condition with a given similarity threshold. * The similarity function and threshold for "`~=`" are controlled via "set" directives. * The "`~=`" operator returns a `boolean` value that represents whether the operands are similar. @@ -1277,7 +1279,7 @@ set simthreshold "0.6f"; for $user in dataset('FacebookUsers') - where $user.friend-ids ~= [1,5,9] + where $user.friend-ids ~= [1,5,9,10] return $user @@ -1315,11 +1317,12 @@ ## Tokenizing Functions [Back to TOC] ## ### word-tokens ### + * Syntax: word-tokens(string_expression) - * Returns a list of word tokens of `string_expression`. + * Returns a list of word tokens of `string_expression` using non-alphanumeric characters as delimiters. * Arguments: * `string_expression` : A `string` that will be tokenized. * Return Value: diff --git a/asterix-doc/src/site/markdown/aql/similarity.md b/asterix-doc/src/site/markdown/aql/similarity.md index 9e07ea1..e221bff 100644 --- a/asterix-doc/src/site/markdown/aql/similarity.md +++ b/asterix-doc/src/site/markdown/aql/similarity.md @@ -43,7 +43,7 @@ ## Similarity Selection Queries [Back to TOC] ## -The following [query](functions.html#edit-distance) +The following query asks for all the Facebook users whose name is similar to `Suzanna Tilson`, i.e., their edit distance is at most 2. @@ -55,14 +55,14 @@ return $user -The following [query](functions.html#similarity-jaccard) +The following query asks for all the Facebook users whose set of friend ids is -similar to `[1,5,9]`, i.e., their Jaccard similarity is at least 0.6. +similar to `[1,5,9,10]`, i.e., their Jaccard similarity is at least 0.6. use dataverse TinySocial; for $user in dataset('FacebookUsers') - let $sim := similarity-jaccard($user.friend-ids, [1,5,9]) + let $sim := similarity-jaccard($user.friend-ids, [1,5,9,10]) where $sim >= 0.6f return $user @@ -78,7 +78,7 @@ set simthreshold "0.6f"; for $user in dataset('FacebookUsers') - where $user.friend-ids ~= [1,5,9] + where $user.friend-ids ~= [1,5,9,10] return $user @@ -170,7 +170,7 @@ use dataverse TinySocial; for $user in dataset('FacebookUsers') - let $sim := similarity-jaccard($user.friend-ids, [1,5,9]) + let $sim := similarity-jaccard($user.friend-ids, [1,5,9,10]) where $sim >= 0.6f return $user @@ -179,8 +179,8 @@ use dataverse TinySocial; for $user in dataset('FacebookUsers') - let $sim := similarity-jaccard($user.friend-ids, [1,5,9]) - where $sim >= 0.6f + let $sim := similarity-jaccard-check($user.friend-ids, [1,5,9,10], 0.6f) + where $sim[0] return $user #### NGram Index usage case - [contains()]((functions.html#contains)) #### @@ -203,6 +203,7 @@ use dataverse TinySocial; + drop index FacebookMessages.fbMessageIdx if exists; create index fbMessageIdx on FacebookMessages(message) type keyword; for $o in dataset('FacebookMessages') -- To view, visit https://asterix-gerrit.ics.uci.edu/351 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ide23cb7fb33a58bcb2eb4535cf89152518d35a86 Gerrit-PatchSet: 1 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Chen Li