asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen Li (Code Review)" <do-not-re...@asterix-gerrit.ics.uci.edu>
Subject Change in asterixdb[master]: fixing minor issues in docs related to similarity queries Ch...
Date Thu, 13 Aug 2015 17:29:51 GMT
Chen Li has submitted this change and it was merged.

Change subject: fixing minor issues in docs related to similarity queries Change-Id: Ide23cb7fb33a58bcb2eb4535cf89152518d35a86
Reviewed-on: https://asterix-gerrit.ics.uci.edu/351 Reviewed-by: Taewoo Kim <wangsaeu@gmail.com>
Tested-by: Jenkins <jenkins@fulliautomatix.ics.
......................................................................


fixing minor issues in docs related to similarity queries
Change-Id: Ide23cb7fb33a58bcb2eb4535cf89152518d35a86
Reviewed-on: https://asterix-gerrit.ics.uci.edu/351
Reviewed-by: Taewoo Kim <wangsaeu@gmail.com>
Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
---
M asterix-doc/src/site/markdown/aql/functions.md
M asterix-doc/src/site/markdown/aql/similarity.md
2 files changed, 25 insertions(+), 21 deletions(-)

Approvals:
  Taewoo Kim: Looks good to me, approved
  Jenkins: Verified



diff --git a/asterix-doc/src/site/markdown/aql/functions.md b/asterix-doc/src/site/markdown/aql/functions.md
index fd00d11..4c2e0c1 100644
--- a/asterix-doc/src/site/markdown/aql/functions.md
+++ b/asterix-doc/src/site/markdown/aql/functions.md
@@ -198,7 +198,7 @@
     * `substring_to_contain` : A target `string` that might be contained.
  * Return Value:
     * A `boolean` value, `true` if `string_expression` contains `substring_to_contain`, and
`false` otherwise.
- * Note: An n-gram index can be utilized for this function.
+ * Note: An [n-gram index](similarity.html#UsingIndexesToSupportSimilarityQueries) can be
utilized for this function.
  * Example:
 
         use dataverse TinySocial;
@@ -1109,20 +1109,21 @@
 
 ## <a id="SimilarityFunctions">Similarity Functions</a> <font size="4"><a
href="#toc">[Back to TOC]</a></font> ##
 
-AsterixDB supports queries with different similarity functions, including edit distance and
Jaccard.
+AsterixDB supports queries with different similarity functions,
+including [edit distance](http://en.wikipedia.org/wiki/Levenshtein_distance) and [Jaccard](https://en.wikipedia.org/wiki/Jaccard_index).
 
 ### edit-distance ###
  * Syntax:
 
         edit-distance(expression1, expression2)
 
- * Returns the [edit distance](http://en.wikipedia.org/wiki/Levenshtein_distance) of `expression1`
and `expression2`.
+ * Returns the edit distance of `expression1` and `expression2`.
  * Arguments:
     * `expression1` : A `string` or a homogeneous `OrderedList` of a comparable item type.
     * `expression2` : The same type as `expression1`.
  * Return Value:
     * An `int64` that represents the edit distance between `expression1` and `expression2`.
- * Note: An n-gram index can be utilized for this function.
+ * Note: An [n-gram index](similarity.html#UsingIndexesToSupportSimilarityQueries) can be
utilized for this function.
  * Example:
 
         use dataverse TinySocial;
@@ -1156,7 +1157,7 @@
     * An `OrderedList` with two items:
         * The first item contains a `boolean` value representing whether `expression1` and
`expression2` are similar.
         * The second item contains an `int64` that represents the edit distance of `expression1`
and `expression2` if it is within the threshold, or 0 otherwise.
- * Note: An n-gram index can be utilized for this function.
+ * Note: An [n-gram index](similarity.html#UsingIndexesToSupportSimilarityQueries) can be
utilized for this function.
  * Example:
 
         use dataverse TinySocial;
@@ -1186,8 +1187,9 @@
     * An `OrderedList` with two items:
         * The first item contains a `boolean` value representing whether `expression1` can
contain `expression2`.
         * The second item contains an `int32` that represents the required edit distance
for `expression1` to contain `expression2` if the first item is true.
-* Note: An n-gram index can be utilized for this function.
+* Note: An [n-gram index](similarity.html#UsingIndexesToSupportSimilarityQueries) can be
utilized for this function.
 * Example:
+
         let $i := edit-distance-contains("happy","hapr",2)
         return $i;
 
@@ -1209,13 +1211,13 @@
     * `list_expression2` : An `UnorderedList` or `OrderedList`.
  * Return Value:
     * A `float` that represents the Jaccard similarity of `list_expression1` and `list_expression2`.
- * Note: A keyword index can be utilized for this function.
+ * Note: A [keyword index](similarity.html#UsingIndexesToSupportSimilarityQueries) can be
utilized for this function.
  * Example:
 
         use dataverse TinySocial;
 
         for $user in dataset('FacebookUsers')
-        let $sim := similarity-jaccard($user.friend-ids, [1,5,9])
+        let $sim := similarity-jaccard($user.friend-ids, [1,5,9,10])
         where $sim >= 0.6f
         return $user
 
@@ -1247,13 +1249,13 @@
     * An `OrderedList` with two items:
      * The first item contains a `boolean` value representing whether `list_expression1`
and `list_expression2` are similar.
      * The second item contains a `float` that represents the Jaccard similarity of `list_expression1`
and `list_expression2` if it is greater than or equal to the threshold, or 0 otherwise.
- * Note: A keyword index can be utilized for this function.
+ * Note: A [keyword index](similarity.html#UsingIndexesToSupportSimilarityQueries) can be
utilized for this function.
  * Example:
 
         use dataverse TinySocial;
 
         for $user in dataset('FacebookUsers')
-        let $sim := similarity-jaccard-check($user.friend-ids, [1,5,9], 0.6f)
+        let $sim := similarity-jaccard-check($user.friend-ids, [1,5,9,10], 0.6f)
         where $sim[0]
         return $sim[1]
 
@@ -1264,7 +1266,7 @@
         1.0f
 
 
-### Similarity Operator ~# ###
+### Similarity Operator ~= ###
  * "`~=`" is syntactic sugar for expressing a similarity condition with a given similarity
threshold.
  * The similarity function and threshold for "`~=`" are controlled via "set" directives.
  * The "`~=`" operator returns a `boolean` value that represents whether the operands are
similar.
@@ -1277,7 +1279,7 @@
         set simthreshold "0.6f";
 
         for $user in dataset('FacebookUsers')
-        where $user.friend-ids ~= [1,5,9]
+        where $user.friend-ids ~= [1,5,9,10]
         return $user
 
 
@@ -1315,11 +1317,12 @@
 
 ## <a id="TokenizingFunctions">Tokenizing Functions</a> <font size="4"><a
href="#toc">[Back to TOC]</a></font> ##
 ### word-tokens ###
+
  * Syntax:
 
         word-tokens(string_expression)
 
- * Returns a list of word tokens of `string_expression`.
+ * Returns a list of word tokens of `string_expression` using non-alphanumeric characters
as delimiters.
  * Arguments:
     * `string_expression` : A `string` that will be tokenized.
  * Return Value:
diff --git a/asterix-doc/src/site/markdown/aql/similarity.md b/asterix-doc/src/site/markdown/aql/similarity.md
index 9e07ea1..e221bff 100644
--- a/asterix-doc/src/site/markdown/aql/similarity.md
+++ b/asterix-doc/src/site/markdown/aql/similarity.md
@@ -43,7 +43,7 @@
 
 ## <a id="SimilaritySelectionQueries">Similarity Selection Queries</a> <font
size="4"><a href="#toc">[Back to TOC]</a></font> ##
 
-The following [query](functions.html#edit-distance)
+The following query
 asks for all the Facebook users whose name is similar to
 `Suzanna Tilson`, i.e., their edit distance is at most 2.
 
@@ -55,14 +55,14 @@
         return $user
 
 
-The following [query](functions.html#similarity-jaccard)
+The following query
 asks for all the Facebook users whose set of friend ids is
-similar to `[1,5,9]`, i.e., their Jaccard similarity is at least 0.6.
+similar to `[1,5,9,10]`, i.e., their Jaccard similarity is at least 0.6.
 
         use dataverse TinySocial;
 
         for $user in dataset('FacebookUsers')
-        let $sim := similarity-jaccard($user.friend-ids, [1,5,9])
+        let $sim := similarity-jaccard($user.friend-ids, [1,5,9,10])
         where $sim >= 0.6f
         return $user
 
@@ -78,7 +78,7 @@
         set simthreshold "0.6f";
 
         for $user in dataset('FacebookUsers')
-        where $user.friend-ids ~= [1,5,9]
+        where $user.friend-ids ~= [1,5,9,10]
         return $user
 
 
@@ -170,7 +170,7 @@
         use dataverse TinySocial;
 
         for $user in dataset('FacebookUsers')
-        let $sim := similarity-jaccard($user.friend-ids, [1,5,9])
+        let $sim := similarity-jaccard($user.friend-ids, [1,5,9,10])
         where $sim >= 0.6f
         return $user
 
@@ -179,8 +179,8 @@
         use dataverse TinySocial;
 
         for $user in dataset('FacebookUsers')
-        let $sim := similarity-jaccard($user.friend-ids, [1,5,9])
-        where $sim >= 0.6f
+        let $sim := similarity-jaccard-check($user.friend-ids, [1,5,9,10], 0.6f)
+        where $sim[0]
         return $user
 
 #### NGram Index usage case - [contains()]((functions.html#contains)) ####
@@ -203,6 +203,7 @@
 
         use dataverse TinySocial;
 
+        drop index FacebookMessages.fbMessageIdx if exists;
         create index fbMessageIdx on FacebookMessages(message) type keyword;
 
         for $o in dataset('FacebookMessages')

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/351
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ide23cb7fb33a58bcb2eb4535cf89152518d35a86
Gerrit-PatchSet: 2
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Chen Li <chenli@gmail.com>
Gerrit-Reviewer: Chen Li <chenli@gmail.com>
Gerrit-Reviewer: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Taewoo Kim <wangsaeu@gmail.com>

Mime
View raw message