asterixdb-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wangs...@apache.org
Subject asterixdb git commit: Add a corner case handling for NGramUTF8StringBinaryTokenizer
Date Tue, 17 Jan 2017 23:43:14 GMT
Repository: asterixdb
Updated Branches:
  refs/heads/master c8ea9d6bb -> 908ae63bb


Add a corner case handling for NGramUTF8StringBinaryTokenizer

 - For a corner case where the length of given string is less than
   the given gram length, it returns 0 as the total number of grams.

Change-Id: I5965856b4da018276b37460bed7fb1fc60d8c2f3
Reviewed-on: https://asterix-gerrit.ics.uci.edu/1448
Reviewed-by: Ian Maxon <imaxon@apache.org>
Sonar-Qube: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
BAD: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>


Project: http://git-wip-us.apache.org/repos/asf/asterixdb/repo
Commit: http://git-wip-us.apache.org/repos/asf/asterixdb/commit/908ae63b
Tree: http://git-wip-us.apache.org/repos/asf/asterixdb/tree/908ae63b
Diff: http://git-wip-us.apache.org/repos/asf/asterixdb/diff/908ae63b

Branch: refs/heads/master
Commit: 908ae63bb7d99ae01f999e9c3b5290c697dce033
Parents: c8ea9d6
Author: Taewoo Kim <wangsaeu@yahoo.com>
Authored: Tue Jan 17 12:58:03 2017 -0800
Committer: Taewoo Kim <wangsaeu@yahoo.com>
Committed: Tue Jan 17 15:42:40 2017 -0800

----------------------------------------------------------------------
 .../tokenizers/NGramUTF8StringBinaryTokenizer.java             | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/asterixdb/blob/908ae63b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/NGramUTF8StringBinaryTokenizer.java
----------------------------------------------------------------------
diff --git a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/NGramUTF8StringBinaryTokenizer.java
b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/NGramUTF8StringBinaryTokenizer.java
index 4c486c5..8bd0c50 100644
--- a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/NGramUTF8StringBinaryTokenizer.java
+++ b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/NGramUTF8StringBinaryTokenizer.java
@@ -110,7 +110,11 @@ public class NGramUTF8StringBinaryTokenizer extends AbstractUTF8StringBinaryToke
         if (usePrePost) {
             totalGrams = numChars + gramLength - 1;
         } else {
-            totalGrams = numChars - gramLength + 1;
+            if (numChars >= gramLength) {
+                totalGrams = numChars - gramLength + 1;
+            } else {
+                totalGrams = 0;
+            }
         }
     }
 


Mime
View raw message