asterixdb-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wangs...@apache.org
Subject [1/4] asterixdb git commit: Full-text implementation step 3
Date Fri, 06 Jan 2017 18:06:14 GMT
Repository: asterixdb
Updated Branches:
  refs/heads/master d49bc6eb1 -> c49405aaf


http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/asterixdb/asterix-doc/src/site/markdown/aql/fulltext.md
----------------------------------------------------------------------
diff --git a/asterixdb/asterix-doc/src/site/markdown/aql/fulltext.md b/asterixdb/asterix-doc/src/site/markdown/aql/fulltext.md
index 921f0b3..4fe17ac 100644
--- a/asterixdb/asterix-doc/src/site/markdown/aql/fulltext.md
+++ b/asterixdb/asterix-doc/src/site/markdown/aql/fulltext.md
@@ -39,9 +39,10 @@ returned as well.
 ## <a id="Syntax">Syntax</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ##
 
 The syntax of AsterixDB FTS follows a portion of the XQuery FullText Search syntax.
-A basic form is as follows:
+Two basic forms are as follows:
 
         ftcontains(Expression1, Expression2, {FullTextOption})
+        ftcontains(Expression1, Expression2)
 
 For example, we can execute the following query to find tweet messages where the `message-text` field includes
 “voice” as a word. Please note that an FTS search is case-insensitive.
@@ -62,6 +63,7 @@ into one of the first two types, i.e., into a string value or an (un)ordered lis
 
 The following examples are all valid expressions.
 
+       ... where ftcontains($msg.message-text, "sound")
        ... where ftcontains($msg.message-text, "sound", {"mode":"any"})
        ... where ftcontains($msg.message-text, ["sound", "system"], {"mode":"any"})
        ... where ftcontains($msg.message-text, {{"speed", "stand", "customization"}}, {"mode":"all"})
@@ -70,30 +72,34 @@ The following examples are all valid expressions.
 
 In the last example above, `$keyword_list` should evaluate to a string or an (un)ordered list of string value(s).
 
-The last `FullTextOption` parameter clarifies the given FTS request. Currently, we only have one option named `mode`.
+The last `FullTextOption` parameter clarifies the given FTS request. If you omit the `FullTextOption` parameter,
+then the default value will be set for each possible option. Currently, we only have one option named `mode`.
 And as we extend the FTS feature, more options will be added. Please note that the format of `FullTextOption`
 is a record, thus you need to put the option(s) in a record `{}`.
 The `mode` option indicates whether the given FTS query is a conjunctive (AND) or disjunctive (OR) search request.
-This option can be either `“any”` or `“all”`. If one specifies `“any”`, a disjunctive search will be conducted.
-For example, the following query will find documents whose `message-text` field contains “sound” or “system”,
-so a document will be returned if it contains either “sound”, “system”, or both of the keywords.
+This option can be either `“any”` or `“all”`. The default value for `mode` is `“all”`. If one specifies `“any”`,
+a disjunctive search will be conducted. For example, the following query will find documents whose `message-text`
+field contains “sound” or “system”, so a document will be returned if it contains either “sound”, “system”,
+or both of the keywords.
 
        ... where ftcontains($msg.message-text, ["sound", "system"], {"mode":"any"})
 
-The other option parameter,`“all”`, specifies a conjunctive search. The following example will find the documents whose
+The other option parameter,`“all”`, specifies a conjunctive search. The following examples will find the documents whose
 `message-text` field contains both “sound” and “system”. If a document contains only “sound” or “system” but
 not both, it will not be returned.
 
        ... where ftcontains($msg.message-text, ["sound", "system"], {"mode":"all"})
+       ... where ftcontains($msg.message-text, ["sound", "system"])
 
 Currently AsterixDB doesn’t (yet) support phrase searches, so the following query will not work.
 
        ... where ftcontains($msg.message-text, "sound system", {"mode":"any"})
 
 As a workaround solution, the following query can be used to achieve a roughly similar goal. The difference is that
-the following query will find documents where `$msg.message-text` contains both “sound” and “system”, but the order
+the following queries will find documents where `$msg.message-text` contains both “sound” and “system”, but the order
 and adjacency of “sound” and “system” are not checked, unlike in a phrase search. As a result, the query below would
 also return documents with “sound system can be installed.”, “system sound is perfect.”,
 or “sound is not clear. You may need to install a new system.”
 
        ... where ftcontains($msg.message-text, ["sound", "system"], {"mode":"all"})
+       ... where ftcontains($msg.message-text, ["sound", "system"])

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/asterixdb/asterix-doc/src/site/markdown/aql/manual.md
----------------------------------------------------------------------
diff --git a/asterixdb/asterix-doc/src/site/markdown/aql/manual.md b/asterixdb/asterix-doc/src/site/markdown/aql/manual.md
index ecdc715..95c752f 100644
--- a/asterixdb/asterix-doc/src/site/markdown/aql/manual.md
+++ b/asterixdb/asterix-doc/src/site/markdown/aql/manual.md
@@ -700,10 +700,11 @@ the URL and path needed to locate the data in HDFS and a description of the data
                          | "rtree"
                          | "keyword"
                          | "ngram" "(" IntegerLiteral ")"
+                         | "fulltext"
 
 The create index statement creates a secondary index on one or more fields of a specified dataset.
 Supported index types include `btree` for totally ordered datatypes,
-`rtree` for spatial data, and `keyword` and `ngram` for textual (string) data.
+`rtree` for spatial data, and `keyword`, `ngram`, and `fulltext` for textual (string) data.
 An index can be created on a nested field (or fields) by providing a valid path expression as an index field identifier.
 An index field is not required to be part of the datatype associated with a dataset if that datatype is declared as
 open and the field's type is provided along with its type and the `enforced` keyword is specified in the end of index definition.
@@ -759,6 +760,15 @@ For details refer to the [document on similarity queries](similarity.html#Keywor
 
     create index fbMessageIdx on FacebookMessages(message) type keyword;
 
+The following example creates a full-text index called fbMessageIdx on the message field of the FacebookMessages dataset.
+This full-text index can be used to optimize queries with full-text search predicates on the message field.
+For details refer to the [document on full-text queries](fulltext.html#toc).
+
+##### Example
+
+    create index fbMessageIdx on FacebookMessages(message) type fulltext;
+
+
 #### Functions
 
 The create function statement creates a named function that can then be used and reused in AQL queries.

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/asterixdb/asterix-doc/src/site/site.xml
----------------------------------------------------------------------
diff --git a/asterixdb/asterix-doc/src/site/site.xml b/asterixdb/asterix-doc/src/site/site.xml
index 4c0ba6e..0e2efc8 100644
--- a/asterixdb/asterix-doc/src/site/site.xml
+++ b/asterixdb/asterix-doc/src/site/site.xml
@@ -100,6 +100,7 @@
 
     <menu name="Advanced Features">
       <item name="Support of Similarity Queries" href="aql/similarity.html"/>
+      <item name="Support of Full-text Queries" href="aql/fulltext.html"/>
       <item name="Accessing External Data" href="aql/externaldata.html"/>
       <item name="Support for Data Ingestion" href="feeds/tutorial.html"/>
       <item name="User Defined Functions" href="udf.html"/>

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/asterixdb/asterix-lang-aql/src/main/javacc/AQL.jj
----------------------------------------------------------------------
diff --git a/asterixdb/asterix-lang-aql/src/main/javacc/AQL.jj b/asterixdb/asterix-lang-aql/src/main/javacc/AQL.jj
index c43ff66..9be5f8a 100644
--- a/asterixdb/asterix-lang-aql/src/main/javacc/AQL.jj
+++ b/asterixdb/asterix-lang-aql/src/main/javacc/AQL.jj
@@ -631,6 +631,10 @@ IndexParams IndexType() throws ParseException:
     {
       type = IndexType.LENGTH_PARTITIONED_WORD_INVIX;
     }
+   |<FULLTEXT>
+    {
+      type = IndexType.SINGLE_PARTITION_WORD_INVIX;
+    }
    |<NGRAM> <LEFTPAREN> <INTEGER_LITERAL>
     {
       type = IndexType.LENGTH_PARTITIONED_NGRAM_INVIX;
@@ -2662,6 +2666,7 @@ TOKEN :
   | <FOR : "for">
   | <FORMAT : "format">
   | <FROM : "from">
+  | <FULLTEXT : "fulltext">
   | <FUNCTION : "function">
   | <GROUP : "group">
   | <HINTS : "hints">

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/asterixdb/asterix-lang-common/src/main/java/org/apache/asterix/lang/common/visitor/FormatPrintVisitor.java
----------------------------------------------------------------------
diff --git a/asterixdb/asterix-lang-common/src/main/java/org/apache/asterix/lang/common/visitor/FormatPrintVisitor.java b/asterixdb/asterix-lang-common/src/main/java/org/apache/asterix/lang/common/visitor/FormatPrintVisitor.java
index fb2d3bd..7a96fac 100644
--- a/asterixdb/asterix-lang-common/src/main/java/org/apache/asterix/lang/common/visitor/FormatPrintVisitor.java
+++ b/asterixdb/asterix-lang-common/src/main/java/org/apache/asterix/lang/common/visitor/FormatPrintVisitor.java
@@ -989,6 +989,8 @@ public class FormatPrintVisitor implements ILangVisitor<Void, Integer> {
                 return "btree";
             case RTREE:
                 return "rtree";
+            case SINGLE_PARTITION_WORD_INVIX:
+                return "fulltext";
             case LENGTH_PARTITIONED_WORD_INVIX:
                 return "keyword";
             case LENGTH_PARTITIONED_NGRAM_INVIX:

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/asterixdb/asterix-lang-sqlpp/src/main/javacc/SQLPP.jj
----------------------------------------------------------------------
diff --git a/asterixdb/asterix-lang-sqlpp/src/main/javacc/SQLPP.jj b/asterixdb/asterix-lang-sqlpp/src/main/javacc/SQLPP.jj
index b6334c8..e08f758 100644
--- a/asterixdb/asterix-lang-sqlpp/src/main/javacc/SQLPP.jj
+++ b/asterixdb/asterix-lang-sqlpp/src/main/javacc/SQLPP.jj
@@ -673,6 +673,10 @@ IndexParams IndexType() throws ParseException:
     {
       type = IndexType.LENGTH_PARTITIONED_WORD_INVIX;
     }
+  |<FULLTEXT>
+    {
+      type = IndexType.SINGLE_PARTITION_WORD_INVIX;
+    }
   | <NGRAM> <LEFTPAREN> <INTEGER_LITERAL>
     {
       type = IndexType.LENGTH_PARTITIONED_NGRAM_INVIX;
@@ -3107,6 +3111,7 @@ TOKEN [IGNORE_CASE]:
   | <FOR : "for">
   | <FROM : "from">
   | <FULL : "full">
+  | <FULLTEXT : "fulltext">
   | <FUNCTION : "function">
   | <GROUP : "group">
   | <HAVING : "having">

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/asterixdb/asterix-om/src/main/java/org/apache/asterix/dataflow/data/common/AOrderedListBinaryTokenizer.java
----------------------------------------------------------------------
diff --git a/asterixdb/asterix-om/src/main/java/org/apache/asterix/dataflow/data/common/AOrderedListBinaryTokenizer.java b/asterixdb/asterix-om/src/main/java/org/apache/asterix/dataflow/data/common/AOrderedListBinaryTokenizer.java
index 32207d3..ace692f 100644
--- a/asterixdb/asterix-om/src/main/java/org/apache/asterix/dataflow/data/common/AOrderedListBinaryTokenizer.java
+++ b/asterixdb/asterix-om/src/main/java/org/apache/asterix/dataflow/data/common/AOrderedListBinaryTokenizer.java
@@ -26,6 +26,7 @@ import org.apache.asterix.om.util.NonTaggedFormatUtil;
 import org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.IBinaryTokenizer;
 import org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.IToken;
 import org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.ITokenFactory;
+import org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.TokenizerInfo.TokenizerType;
 
 public class AOrderedListBinaryTokenizer implements IBinaryTokenizer {
 
@@ -90,4 +91,9 @@ public class AOrderedListBinaryTokenizer implements IBinaryTokenizer {
     public short getTokensCount() {
         return (short) listLength;
     }
+
+    @Override
+    public TokenizerType getTokenizerType() {
+        return TokenizerType.LIST;
+    }
 }

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/asterixdb/asterix-om/src/main/java/org/apache/asterix/om/functions/BuiltinFunctions.java
----------------------------------------------------------------------
diff --git a/asterixdb/asterix-om/src/main/java/org/apache/asterix/om/functions/BuiltinFunctions.java b/asterixdb/asterix-om/src/main/java/org/apache/asterix/om/functions/BuiltinFunctions.java
index 089f804..3f3a5bd 100644
--- a/asterixdb/asterix-om/src/main/java/org/apache/asterix/om/functions/BuiltinFunctions.java
+++ b/asterixdb/asterix-om/src/main/java/org/apache/asterix/om/functions/BuiltinFunctions.java
@@ -487,8 +487,11 @@ public class BuiltinFunctions {
             "edit-distance-contains", 3);
 
     // full-text
-    public static final FunctionIdentifier FULLTEXT_CONTAINS = new FunctionIdentifier(FunctionConstants.ASTERIX_NS,
-            "ftcontains", 3);
+    public static final FunctionIdentifier FULLTEXT_CONTAINS =
+            new FunctionIdentifier(FunctionConstants.ASTERIX_NS, "ftcontains", 3);
+    // full-text without any option provided
+    public static final FunctionIdentifier FULLTEXT_CONTAINS_WO_OPTION =
+            new FunctionIdentifier(FunctionConstants.ASTERIX_NS, "ftcontains", 2);
 
     // tokenizers:
     public static final FunctionIdentifier WORD_TOKENS = new FunctionIdentifier(FunctionConstants.ASTERIX_NS,
@@ -1033,6 +1036,7 @@ public class BuiltinFunctions {
 
         // Full-text function
         addFunction(FULLTEXT_CONTAINS, ABooleanTypeComputer.INSTANCE, true);
+        addFunction(FULLTEXT_CONTAINS_WO_OPTION, ABooleanTypeComputer.INSTANCE, true);
 
         // Spatial functions
         addFunction(SPATIAL_AREA, ADoubleTypeComputer.INSTANCE, true);

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/asterixdb/asterix-om/src/main/java/org/apache/asterix/om/util/ConstantExpressionUtil.java
----------------------------------------------------------------------
diff --git a/asterixdb/asterix-om/src/main/java/org/apache/asterix/om/util/ConstantExpressionUtil.java b/asterixdb/asterix-om/src/main/java/org/apache/asterix/om/util/ConstantExpressionUtil.java
index e627d95..406f356 100644
--- a/asterixdb/asterix-om/src/main/java/org/apache/asterix/om/util/ConstantExpressionUtil.java
+++ b/asterixdb/asterix-om/src/main/java/org/apache/asterix/om/util/ConstantExpressionUtil.java
@@ -21,7 +21,9 @@ package org.apache.asterix.om.util;
 import org.apache.asterix.om.base.ABoolean;
 import org.apache.asterix.om.base.AInt32;
 import org.apache.asterix.om.base.AInt64;
+import org.apache.asterix.om.base.AOrderedList;
 import org.apache.asterix.om.base.AString;
+import org.apache.asterix.om.base.AUnorderedList;
 import org.apache.asterix.om.base.IAObject;
 import org.apache.asterix.om.constants.AsterixConstantValue;
 import org.apache.asterix.om.types.ATypeTag;
@@ -36,7 +38,7 @@ public class ConstantExpressionUtil {
     private ConstantExpressionUtil() {
     }
 
-    private static IAObject getConstantIaObject(ILogicalExpression expr, ATypeTag typeTag) {
+    public static IAObject getConstantIaObject(ILogicalExpression expr, ATypeTag typeTag) {
         if (expr.getExpressionTag() != LogicalExpressionTag.CONSTANT) {
             return null;
         }
@@ -72,6 +74,21 @@ public class ConstantExpressionUtil {
         return iaObject != null ? ((AString) iaObject).getStringValue() : null;
     }
 
+    public static String getStringConstant(IAObject iaObject) {
+        // Make sure to call this method after checking the type of the given object.
+        return iaObject != null ? ((AString) iaObject).getStringValue() : null;
+    }
+
+    public static AOrderedList getOrderedListConstant(IAObject iaObject) {
+        // Make sure to call this method after checking the type of the given object.
+        return iaObject != null ? (AOrderedList) iaObject : null;
+    }
+
+    public static AUnorderedList getUnorderedListConstant(IAObject iaObject) {
+        // Make sure to call this method after checking the type of the given object.
+        return iaObject != null ? (AUnorderedList) iaObject : null;
+    }
+
     public static Boolean getBooleanConstant(ILogicalExpression expr) {
         final IAObject iaObject = getConstantIaObject(expr, ATypeTag.BOOLEAN);
         return iaObject != null ? ((ABoolean) iaObject).getBoolean() : null;

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/FullTextContainsEvaluator.java
----------------------------------------------------------------------
diff --git a/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/FullTextContainsEvaluator.java b/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/FullTextContainsEvaluator.java
index 471b209..b94821f 100644
--- a/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/FullTextContainsEvaluator.java
+++ b/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/FullTextContainsEvaluator.java
@@ -19,7 +19,6 @@
 package org.apache.asterix.runtime.evaluators.common;
 
 import java.io.DataOutput;
-import java.util.Arrays;
 
 import org.apache.asterix.formats.nontagged.BinaryComparatorFactoryProvider;
 import org.apache.asterix.formats.nontagged.BinaryTokenizerFactoryProvider;
@@ -92,8 +91,10 @@ public class FullTextContainsEvaluator implements IScalarEvaluator {
     // array that contains the key
     private BinaryHashSet rightHashSet = null;
 
-    // Checks whether the query array has been changed
+    // Keeps the query array. This is used to check whether the query predicate has been changed (e.g., join case)
     private byte[] queryArray = null;
+    private int queryArrayStartOffset = -1;
+    private int queryArrayLength = -1;
 
     // If the following is 1, then we will do a disjunctive search.
     // Else if it is equal to the number of tokens, then we will do a conjunctive search.
@@ -172,11 +173,13 @@ public class FullTextContainsEvaluator implements IScalarEvaluator {
      */
     private boolean fullTextContainsWithArg(ATypeTag typeTag2, IPointable arg1, IPointable arg2)
             throws HyracksDataException {
-        // Since a fulltext search form is "X contains text Y",
+        // Since a fulltext search form is "ftcontains(X,Y,options)",
         // X (document) is the left side and Y (query predicate) is the right side.
 
         // Initialize variables that are required to conduct full-text search. (e.g., hash-set, tokenizer ...)
-        initializeFullTextContains(typeTag2);
+        if (rightHashSet == null) {
+            initializeFullTextContains();
+        }
 
         // Type tag checking is already done in the previous steps.
         // So we directly conduct the full-text search process.
@@ -185,7 +188,8 @@ public class FullTextContainsEvaluator implements IScalarEvaluator {
 
         // Checks whether a new query predicate is introduced.
         // If not, we can re-use the query predicate array we have already created.
-        if (!Arrays.equals(queryArray, arg2Array)) {
+        if (!partOfArrayEquals(queryArray, queryArrayStartOffset, queryArrayLength, arg2Array, arg2.getStartOffset(),
+                arg2.getLength())) {
             resetQueryArrayAndRight(arg2Array, typeTag2, arg2);
         } else {
             // The query predicate remains the same. However, the count of each token should be reset to zero.
@@ -196,23 +200,22 @@ public class FullTextContainsEvaluator implements IScalarEvaluator {
         return readLeftAndConductSearch(arg1);
     }
 
-    private void initializeFullTextContains(ATypeTag predicateTypeTag) {
+    private void initializeFullTextContains() {
         // We use a hash set to store tokens from the right side (query predicate).
         // Initialize necessary variables.
-        if (rightHashSet == null) {
-            hashFunc = new PointableBinaryHashFunctionFactory(UTF8StringLowercaseTokenPointable.FACTORY)
-                    .createBinaryHashFunction();
-            keyEntry = new BinaryEntry();
-            // Parameter: number of bucket, frame size, hashFunction, Comparator, byte
-            // array that contains the key (this array will be set later.)
-            rightHashSet = new BinaryHashSet(HASH_SET_SLOT_SIZE, HASH_SET_FRAME_SIZE, hashFunc, strLowerCaseTokenCmp,
-                    null);
-            tokenizerForLeftArray = BinaryTokenizerFactoryProvider.INSTANCE
-                    .getWordTokenizerFactory(ATypeTag.STRING, false, true).createTokenizer();
-        }
+        hashFunc = new PointableBinaryHashFunctionFactory(UTF8StringLowercaseTokenPointable.FACTORY)
+                .createBinaryHashFunction();
+        keyEntry = new BinaryEntry();
+        // Parameter: number of bucket, frame size, hashFunction, Comparator, byte array
+        // that contains the key (this array will be set later.)
+        rightHashSet = new BinaryHashSet(HASH_SET_SLOT_SIZE, HASH_SET_FRAME_SIZE, hashFunc, strLowerCaseTokenCmp, null);
+        tokenizerForLeftArray = BinaryTokenizerFactoryProvider.INSTANCE
+                .getWordTokenizerFactory(ATypeTag.STRING, false, true).createTokenizer();
+    }
 
+    void resetQueryArrayAndRight(byte[] arg2Array, ATypeTag typeTag2, IPointable arg2) throws HyracksDataException {
         // If the right side is an (un)ordered list, we need to apply the (un)ordered list tokenizer.
-        switch (predicateTypeTag) {
+        switch (typeTag2) {
             case ORDEREDLIST:
                 tokenizerForRightArray = BinaryTokenizerFactoryProvider.INSTANCE
                         .getWordTokenizerFactory(ATypeTag.ORDEREDLIST, false, true).createTokenizer();
@@ -228,11 +231,10 @@ public class FullTextContainsEvaluator implements IScalarEvaluator {
             default:
                 break;
         }
-    }
 
-    void resetQueryArrayAndRight(byte[] arg2Array, ATypeTag typeTag2, IPointable arg2) throws HyracksDataException {
-        queryArray = new byte[arg2Array.length];
-        System.arraycopy(arg2Array, 0, queryArray, 0, arg2Array.length);
+        queryArray = arg2Array;
+        queryArrayStartOffset = arg2.getStartOffset();
+        queryArrayLength = arg2.getLength();
 
         // Clear hash set for the search predicates.
         rightHashSet.clear();
@@ -242,11 +244,8 @@ public class FullTextContainsEvaluator implements IScalarEvaluator {
         int queryTokenCount = 0;
         int uniqueQueryTokenCount = 0;
 
-        int startOffset = arg2.getStartOffset();
-        int length = arg2.getLength();
-
         // Reset the tokenizer for the given keywords in the given query
-        tokenizerForRightArray.reset(queryArray, startOffset, length);
+        tokenizerForRightArray.reset(queryArray, queryArrayStartOffset, queryArrayLength);
 
         // Create tokens from the given query predicate
         while (tokenizerForRightArray.hasNext()) {
@@ -324,7 +323,8 @@ public class FullTextContainsEvaluator implements IScalarEvaluator {
     }
 
     /**
-     * Set full-text options. The odd element is an option name and the even element is the argument for that option.
+     * Sets the full-text options. The odd element is an option name and the even element is the argument
+     * for that option. (e.g., argOptions[0] = "mode", argOptions[1] = "all")
      */
     private void setFullTextOption(IPointable[] argOptions, int uniqueQueryTokenCount) throws HyracksDataException {
         for (int i = 0; i < optionArgsLength; i = i + 2) {
@@ -351,14 +351,14 @@ public class FullTextContainsEvaluator implements IScalarEvaluator {
         int foundCount = 0;
 
         // The left side: field (document)
-        // Reset the tokenizer for the given keywords in a document.
+        // Resets the tokenizer for the given keywords in a document.
         tokenizerForLeftArray.reset(arg1.getByteArray(), arg1.getStartOffset(), arg1.getLength());
 
-        // Create tokens from a field in the left side (document)
+        // Creates tokens from a field in the left side (document)
         while (tokenizerForLeftArray.hasNext()) {
             tokenizerForLeftArray.next();
 
-            // Record the starting position and the length of the current token.
+            // Records the starting position and the length of the current token.
             keyEntry.set(tokenizerForLeftArray.getToken().getStartOffset(),
                     tokenizerForLeftArray.getToken().getTokenLength());
 
@@ -386,7 +386,8 @@ public class FullTextContainsEvaluator implements IScalarEvaluator {
     }
 
     /**
-     * Check the argument types. The argument1 should be a string. The argument2 should be a string or (un)ordered list.
+     * Checks the argument types. The argument1 should be a string.
+     * The argument2 should be a string or an (un)ordered list.
      */
     protected boolean checkArgTypes(ATypeTag typeTag1, ATypeTag typeTag2) throws HyracksDataException {
         if ((typeTag1 != ATypeTag.STRING) || (typeTag2 != ATypeTag.ORDEREDLIST && typeTag2 != ATypeTag.UNORDEREDLIST
@@ -396,4 +397,31 @@ public class FullTextContainsEvaluator implements IScalarEvaluator {
         return true;
     }
 
+    /**
+     * Checks whether the content of the given two arrays are equal.
+     * The code is utilizing the Arrays.equals() code. The difference is that
+     * this method only compares the certain portion of each array.
+     */
+    private static boolean partOfArrayEquals(byte[] array1, int start1, int length1, byte[] array2, int start2,
+            int length2) {
+        // Sanity check
+        if (length1 != length2 || array1 == null || array2 == null) {
+            return false;
+        }
+
+        if (array1 == array2 && start1 == start2 && length1 == length2) {
+            return true;
+        }
+
+        int offset = 0;
+        while (offset < length1) {
+            if (array1[start1 + offset] != array2[start2 + offset]) {
+                return false;
+            }
+            offset++;
+        }
+
+        return true;
+    }
+
 }

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/FullTextContainsWithoutOptionDescriptor.java
----------------------------------------------------------------------
diff --git a/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/FullTextContainsWithoutOptionDescriptor.java b/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/FullTextContainsWithoutOptionDescriptor.java
new file mode 100644
index 0000000..7cfaa62
--- /dev/null
+++ b/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/FullTextContainsWithoutOptionDescriptor.java
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.asterix.runtime.evaluators.functions;
+
+import org.apache.asterix.om.functions.BuiltinFunctions;
+import org.apache.asterix.om.functions.IFunctionDescriptor;
+import org.apache.asterix.om.functions.IFunctionDescriptorFactory;
+import org.apache.asterix.runtime.evaluators.base.AbstractScalarFunctionDynamicDescriptor;
+import org.apache.asterix.runtime.evaluators.common.FullTextContainsEvaluator;
+import org.apache.hyracks.algebricks.common.exceptions.AlgebricksException;
+import org.apache.hyracks.algebricks.core.algebra.functions.FunctionIdentifier;
+import org.apache.hyracks.algebricks.runtime.base.IScalarEvaluator;
+import org.apache.hyracks.algebricks.runtime.base.IScalarEvaluatorFactory;
+import org.apache.hyracks.api.context.IHyracksTaskContext;
+import org.apache.hyracks.api.exceptions.HyracksDataException;
+
+public class FullTextContainsWithoutOptionDescriptor extends AbstractScalarFunctionDynamicDescriptor {
+    private static final long serialVersionUID = 1L;
+
+    public static final IFunctionDescriptorFactory FACTORY = new IFunctionDescriptorFactory() {
+        @Override
+        public IFunctionDescriptor createFunctionDescriptor() {
+            return new FullTextContainsWithoutOptionDescriptor();
+        }
+    };
+
+    /**
+     * Creates full-text search evaluator. There are two arguments:
+     * arg0: Expression1 - search field
+     * arg1: Expression2 - search predicate
+     */
+    @Override
+    public IScalarEvaluatorFactory createEvaluatorFactory(final IScalarEvaluatorFactory[] args)
+            throws AlgebricksException {
+        return new IScalarEvaluatorFactory() {
+            private static final long serialVersionUID = 1L;
+
+            @Override
+            public IScalarEvaluator createScalarEvaluator(IHyracksTaskContext ctx) throws HyracksDataException {
+                return new FullTextContainsEvaluator(args, ctx);
+            }
+        };
+    }
+
+    @Override
+    public FunctionIdentifier getIdentifier() {
+        return BuiltinFunctions.FULLTEXT_CONTAINS_WO_OPTION;
+    }
+
+
+}

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-nc/src/main/java/org/apache/hyracks/control/nc/io/IOManager.java
----------------------------------------------------------------------
diff --git a/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-nc/src/main/java/org/apache/hyracks/control/nc/io/IOManager.java b/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-nc/src/main/java/org/apache/hyracks/control/nc/io/IOManager.java
index 80bb662..352f912 100644
--- a/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-nc/src/main/java/org/apache/hyracks/control/nc/io/IOManager.java
+++ b/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-nc/src/main/java/org/apache/hyracks/control/nc/io/IOManager.java
@@ -338,11 +338,17 @@ public class IOManager implements IIOManager {
         return new FileReference(deviceComputer.compute(path), path);
     }
 
+    // Temp:
     @Override
     public FileReference resolveAbsolutePath(String path) throws HyracksDataException {
         IODeviceHandle devHandle = getDevice(path);
         if (devHandle == null) {
-            throw new HyracksDataException("The file with absolute path: " + path + " is outside all io devices");
+            String errorMessage = "The file with absolute path: " + path
+                    + " is outside all IO devices. IO devices in this node are \n";
+            for (IODeviceHandle d : ioDevices) {
+                errorMessage = errorMessage.concat(d.toString() + '\n');
+            }
+            throw new HyracksDataException(errorMessage);
         }
         String relativePath = devHandle.getRelativePath(path);
         return new FileReference(devHandle, relativePath);

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/BinaryHashSet.java
----------------------------------------------------------------------
diff --git a/hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/BinaryHashSet.java b/hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/BinaryHashSet.java
index c3e36da..1996b4e 100644
--- a/hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/BinaryHashSet.java
+++ b/hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/BinaryHashSet.java
@@ -258,20 +258,19 @@ public class BinaryHashSet {
     }
 
     /**
-     * Iterate all key entries and reset the foundCount of each key to zero.
+     * Iterates all key entries and resets the foundCount of each key to zero.
      */
     public void clearFoundCount() {
-        int currentListHeadIndex = 0;
         ByteBuffer frame;
         int frameNum;
         int frameOff;
         int headPtr;
-        int checkedListHeadIndex = -1;
+        final int resetCount = 0;
 
-        while (true) {
+        for (int currentListHeadIndex = 0; currentListHeadIndex < listHeads.length; currentListHeadIndex++) {
             // Position to first non-null list-head pointer.
-            while (currentListHeadIndex < listHeads.length && listHeads[currentListHeadIndex] == NULL_PTR) {
-                currentListHeadIndex++;
+            if (listHeads[currentListHeadIndex] == NULL_PTR) {
+                continue;
             }
             headPtr = listHeads[currentListHeadIndex];
             do {
@@ -281,18 +280,11 @@ public class BinaryHashSet {
                 frame = frames.get(frameNum);
 
                 // Set the count as zero
-                frame.put(frameOff + 2 * SLOT_SIZE, (byte) 0);
+                frame.put(frameOff + 2 * SLOT_SIZE, (byte) resetCount);
 
                 // Get next key position
                 headPtr = frame.getInt(frameOff + 2 * SLOT_SIZE + COUNT_SIZE);
             } while (headPtr != NULL_PTR);
-
-            if (checkedListHeadIndex == currentListHeadIndex) {
-                // no more slots to read - we stop here.
-                break;
-            }
-
-            checkedListHeadIndex = currentListHeadIndex;
         }
     }
 

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/dataflow/LSMInvertedIndexDataflowHelper.java
----------------------------------------------------------------------
diff --git a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/dataflow/LSMInvertedIndexDataflowHelper.java b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/dataflow/LSMInvertedIndexDataflowHelper.java
index fd57414..237b567 100644
--- a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/dataflow/LSMInvertedIndexDataflowHelper.java
+++ b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/dataflow/LSMInvertedIndexDataflowHelper.java
@@ -88,7 +88,7 @@ public final class LSMInvertedIndexDataflowHelper extends AbstractLSMIndexDatafl
                     diskFileMapProvider, invIndexOpDesc.getInvListsTypeTraits(),
                     invIndexOpDesc.getInvListsComparatorFactories(), invIndexOpDesc.getTokenTypeTraits(),
                     invIndexOpDesc.getTokenComparatorFactories(), invIndexOpDesc.getTokenizerFactory(),
-                    diskBufferCache, fileRef.getFile().getPath(), bloomFilterFalsePositiveRate, mergePolicy,
+                    diskBufferCache, fileRef.getFile().getAbsolutePath(), bloomFilterFalsePositiveRate, mergePolicy,
                     opTrackerFactory.getOperationTracker(ctx), ioScheduler,
                     ioOpCallbackFactory.createIOOperationCallback(), invertedIndexFields, filterTypeTraits,
                     filterCmpFactories, filterFields, filterFieldsForNonBulkLoadOps,

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/dataflow/LSMInvertedIndexSearchOperatorDescriptor.java
----------------------------------------------------------------------
diff --git a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/dataflow/LSMInvertedIndexSearchOperatorDescriptor.java b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/dataflow/LSMInvertedIndexSearchOperatorDescriptor.java
index 82a8dc4..7c21c38 100644
--- a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/dataflow/LSMInvertedIndexSearchOperatorDescriptor.java
+++ b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/dataflow/LSMInvertedIndexSearchOperatorDescriptor.java
@@ -47,6 +47,7 @@ public class LSMInvertedIndexSearchOperatorDescriptor extends AbstractLSMInverte
     private final IInvertedIndexSearchModifierFactory searchModifierFactory;
     private final int[] minFilterFieldIndexes;
     private final int[] maxFilterFieldIndexes;
+    private final boolean isFullTextSearchQuery;
 
     public LSMInvertedIndexSearchOperatorDescriptor(IOperatorDescriptorRegistry spec, int queryField,
             IStorageManagerInterface storageManager, IFileSplitProvider fileSplitProvider,
@@ -57,7 +58,7 @@ public class LSMInvertedIndexSearchOperatorDescriptor extends AbstractLSMInverte
             IInvertedIndexSearchModifierFactory searchModifierFactory, RecordDescriptor recDesc, boolean retainInput,
             boolean retainNull, IMissingWriterFactory nullWriterFactory,
             ISearchOperationCallbackFactory searchOpCallbackProvider, int[] minFilterFieldIndexes,
-            int[] maxFilterFieldIndexes, IPageManagerFactory pageManagerFactory) {
+            int[] maxFilterFieldIndexes, IPageManagerFactory pageManagerFactory, boolean isFullTextSearchQuery) {
         super(spec, 1, 1, recDesc, storageManager, fileSplitProvider, lifecycleManagerProvider, tokenTypeTraits,
                 tokenComparatorFactories, invListsTypeTraits, invListComparatorFactories, queryTokenizerFactory,
                 btreeDataflowHelperFactory, null, retainInput, retainNull, nullWriterFactory,
@@ -67,6 +68,7 @@ public class LSMInvertedIndexSearchOperatorDescriptor extends AbstractLSMInverte
         this.searchModifierFactory = searchModifierFactory;
         this.minFilterFieldIndexes = minFilterFieldIndexes;
         this.maxFilterFieldIndexes = maxFilterFieldIndexes;
+        this.isFullTextSearchQuery = isFullTextSearchQuery;
     }
 
     @Override
@@ -74,6 +76,6 @@ public class LSMInvertedIndexSearchOperatorDescriptor extends AbstractLSMInverte
             IRecordDescriptorProvider recordDescProvider, int partition, int nPartitions) throws HyracksDataException {
         IInvertedIndexSearchModifier searchModifier = searchModifierFactory.createSearchModifier();
         return new LSMInvertedIndexSearchOperatorNodePushable(this, ctx, partition, recordDescProvider, queryField,
-                searchModifier, minFilterFieldIndexes, maxFilterFieldIndexes);
+                searchModifier, minFilterFieldIndexes, maxFilterFieldIndexes, isFullTextSearchQuery);
     }
 }

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/dataflow/LSMInvertedIndexSearchOperatorNodePushable.java
----------------------------------------------------------------------
diff --git a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/dataflow/LSMInvertedIndexSearchOperatorNodePushable.java b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/dataflow/LSMInvertedIndexSearchOperatorNodePushable.java
index 09893fb..4634c7f 100644
--- a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/dataflow/LSMInvertedIndexSearchOperatorNodePushable.java
+++ b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/dataflow/LSMInvertedIndexSearchOperatorNodePushable.java
@@ -34,14 +34,19 @@ public class LSMInvertedIndexSearchOperatorNodePushable extends IndexSearchOpera
     protected final IInvertedIndexSearchModifier searchModifier;
     protected final int queryFieldIndex;
     protected final int invListFields;
+    // Keeps the information whether the given query is a full-text search or not.
+    // We need to have this information to stop the search process since we don't allow a phrase search yet.
+    protected final boolean isFullTextSearchQuery;
 
     public LSMInvertedIndexSearchOperatorNodePushable(IIndexOperatorDescriptor opDesc, IHyracksTaskContext ctx,
             int partition, IRecordDescriptorProvider recordDescProvider, int queryFieldIndex,
-            IInvertedIndexSearchModifier searchModifier, int[] minFilterFieldIndexes, int[] maxFilterFieldIndexes)
+            IInvertedIndexSearchModifier searchModifier, int[] minFilterFieldIndexes, int[] maxFilterFieldIndexes,
+            boolean isFullTextSearchQuery)
             throws HyracksDataException {
         super(opDesc, ctx, partition, recordDescProvider, minFilterFieldIndexes, maxFilterFieldIndexes);
         this.searchModifier = searchModifier;
         this.queryFieldIndex = queryFieldIndex;
+        this.isFullTextSearchQuery = isFullTextSearchQuery;
         // If retainInput is true, the frameTuple is created in IndexSearchOperatorNodePushable.open().
         if (!opDesc.getRetainInput()) {
             this.frameTuple = new FrameTupleReference();
@@ -54,7 +59,7 @@ public class LSMInvertedIndexSearchOperatorNodePushable extends IndexSearchOpera
     protected ISearchPredicate createSearchPredicate() {
         AbstractLSMInvertedIndexOperatorDescriptor invIndexOpDesc = (AbstractLSMInvertedIndexOperatorDescriptor) opDesc;
         return new InvertedIndexSearchPredicate(invIndexOpDesc.getTokenizerFactory().createTokenizer(), searchModifier,
-                minFilterKey, maxFilterKey);
+                minFilterKey, maxFilterKey, isFullTextSearchQuery);
     }
 
     @Override
@@ -63,6 +68,7 @@ public class LSMInvertedIndexSearchOperatorNodePushable extends IndexSearchOpera
         InvertedIndexSearchPredicate invIndexSearchPred = (InvertedIndexSearchPredicate) searchPred;
         invIndexSearchPred.setQueryTuple(frameTuple);
         invIndexSearchPred.setQueryFieldIndex(queryFieldIndex);
+        invIndexSearchPred.setIsFullTextSearchQuery(isFullTextSearchQuery);
         if (minFilterKey != null) {
             minFilterKey.reset(accessor, tupleIndex);
         }

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/ondisk/OnDiskInvertedIndexFactory.java
----------------------------------------------------------------------
diff --git a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/ondisk/OnDiskInvertedIndexFactory.java b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/ondisk/OnDiskInvertedIndexFactory.java
index 7111097..14ceee8 100644
--- a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/ondisk/OnDiskInvertedIndexFactory.java
+++ b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/ondisk/OnDiskInvertedIndexFactory.java
@@ -58,7 +58,7 @@ public class OnDiskInvertedIndexFactory extends IndexFactory<IInvertedIndex> {
 
     @Override
     public IInvertedIndex createIndexInstance(FileReference dictBTreeFile) throws IndexException, HyracksDataException {
-        String invListsFilePath = fileNameMapper.getInvListsFilePath(dictBTreeFile.getFile().getPath());
+        String invListsFilePath = fileNameMapper.getInvListsFilePath(dictBTreeFile.getFile().getAbsolutePath());
         FileReference invListsFile = ioManager.resolveAbsolutePath(invListsFilePath);
         IInvertedListBuilder invListBuilder = invListBuilderFactory.create();
         return new OnDiskInvertedIndex(bufferCache, fileMapProvider, invListBuilder, invListTypeTraits,

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/search/AbstractTOccurrenceSearcher.java
----------------------------------------------------------------------
diff --git a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/search/AbstractTOccurrenceSearcher.java b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/search/AbstractTOccurrenceSearcher.java
index 7d34198..cfc9fc6 100644
--- a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/search/AbstractTOccurrenceSearcher.java
+++ b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/search/AbstractTOccurrenceSearcher.java
@@ -44,8 +44,10 @@ import org.apache.hyracks.storage.am.lsm.invertedindex.api.IObjectFactory;
 import org.apache.hyracks.storage.am.lsm.invertedindex.exceptions.OccurrenceThresholdPanicException;
 import org.apache.hyracks.storage.am.lsm.invertedindex.ondisk.FixedSizeFrameTupleAccessor;
 import org.apache.hyracks.storage.am.lsm.invertedindex.ondisk.FixedSizeTupleReference;
+import org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.DelimitedUTF8StringBinaryTokenizer;
 import org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.IBinaryTokenizer;
 import org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.IToken;
+import org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.TokenizerInfo.TokenizerType;
 import org.apache.hyracks.storage.am.lsm.invertedindex.util.ObjectCache;
 
 public abstract class AbstractTOccurrenceSearcher implements IInvertedIndexSearcher {
@@ -96,6 +98,13 @@ public abstract class AbstractTOccurrenceSearcher implements IInvertedIndexSearc
         ITupleReference queryTuple = searchPred.getQueryTuple();
         int queryFieldIndex = searchPred.getQueryFieldIndex();
         IBinaryTokenizer queryTokenizer = searchPred.getQueryTokenizer();
+        // Is this a full-text query?
+        // Then, the last argument is conjuctive or disjunctive search option, not a query text.
+        // Thus, we need to remove the last argument.
+        boolean isFullTextSearchQuery = searchPred.getIsFullTextSearchQuery();
+        // Get the type of query tokenizer.
+        TokenizerType queryTokenizerType = queryTokenizer.getTokenizerType();
+        int tokenCountInOneField = 0;
 
         queryTokenAppender.reset(queryTokenFrame, true);
         queryTokenizer.reset(queryTuple.getFieldData(queryFieldIndex), queryTuple.getFieldStart(queryFieldIndex),
@@ -104,8 +113,29 @@ public abstract class AbstractTOccurrenceSearcher implements IInvertedIndexSearc
         while (queryTokenizer.hasNext()) {
             queryTokenizer.next();
             queryTokenBuilder.reset();
+            tokenCountInOneField++;
             try {
                 IToken token = queryTokenizer.getToken();
+                // For the full-text search, we don't support a phrase search yet.
+                // So, each field should have only one token.
+                // If it's a list, it can have multiple keywords in it. But, each keyword should not be a phrase.
+                if (isFullTextSearchQuery) {
+                    if (queryTokenizerType == TokenizerType.STRING && tokenCountInOneField > 1) {
+                        throw new HyracksDataException(
+                                "Phrase search in Full-text is not supported. "
+                                        + "An expression should include only one word.");
+                    } else if (queryTokenizerType == TokenizerType.LIST) {
+                        for (int j = 1; j < token.getTokenLength(); j++) {
+                            if (DelimitedUTF8StringBinaryTokenizer
+                                    .isSeparator((char) token.getData()[token.getStartOffset() + j])) {
+                                throw new HyracksDataException(
+                                        "Phrase search in Full-text is not supported. "
+                                                + "An expression should include only one word.");
+                            }
+                        }
+                    }
+                }
+
                 token.serializeToken(queryTokenBuilder.getFieldData());
                 queryTokenBuilder.addFieldEndOffset();
                 // WARNING: assuming one frame is big enough to hold all tokens

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/search/DisjunctiveSearchModifier.java
----------------------------------------------------------------------
diff --git a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/search/DisjunctiveSearchModifier.java b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/search/DisjunctiveSearchModifier.java
new file mode 100644
index 0000000..b498411
--- /dev/null
+++ b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/search/DisjunctiveSearchModifier.java
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hyracks.storage.am.lsm.invertedindex.search;
+
+import org.apache.hyracks.storage.am.lsm.invertedindex.api.IInvertedIndexSearchModifier;
+
+/**
+ * Search modifier that supports disjunctive conditions.
+ */
+public class DisjunctiveSearchModifier implements IInvertedIndexSearchModifier {
+
+    @Override
+    public int getOccurrenceThreshold(int numQueryTokens) {
+        return 1;
+    }
+
+    @Override
+    public int getNumPrefixLists(int occurrenceThreshold, int numInvLists) {
+        return numInvLists;
+    }
+
+    @Override
+    public String toString() {
+        return "Disjunctive Search Modifier";
+    }
+
+    @Override
+    public short getNumTokensLowerBound(short numQueryTokens) {
+        return -1;
+    }
+
+    @Override
+    public short getNumTokensUpperBound(short numQueryTokens) {
+        return -1;
+    }
+}

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/search/DisjunctiveSearchModifierFactory.java
----------------------------------------------------------------------
diff --git a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/search/DisjunctiveSearchModifierFactory.java b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/search/DisjunctiveSearchModifierFactory.java
new file mode 100644
index 0000000..79976f4
--- /dev/null
+++ b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/search/DisjunctiveSearchModifierFactory.java
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hyracks.storage.am.lsm.invertedindex.search;
+
+import org.apache.hyracks.storage.am.lsm.invertedindex.api.IInvertedIndexSearchModifier;
+import org.apache.hyracks.storage.am.lsm.invertedindex.api.IInvertedIndexSearchModifierFactory;
+
+public class DisjunctiveSearchModifierFactory implements IInvertedIndexSearchModifierFactory {
+    private static final long serialVersionUID = 1L;
+
+    @Override
+    public IInvertedIndexSearchModifier createSearchModifier() {
+        return new DisjunctiveSearchModifier();
+    }
+}

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/search/InvertedIndexSearchPredicate.java
----------------------------------------------------------------------
diff --git a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/search/InvertedIndexSearchPredicate.java b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/search/InvertedIndexSearchPredicate.java
index e37f007..fe1a6d7 100644
--- a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/search/InvertedIndexSearchPredicate.java
+++ b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/search/InvertedIndexSearchPredicate.java
@@ -32,17 +32,22 @@ public class InvertedIndexSearchPredicate extends AbstractSearchPredicate {
     private int queryFieldIndex;
     private final IBinaryTokenizer queryTokenizer;
     private final IInvertedIndexSearchModifier searchModifier;
+    // Keeps the information whether the given query is a full-text search or not.
+    // We need to have this information to stop the search process since we don't allow a phrase search yet.
+    private boolean isFullTextSearchQuery;
 
     public InvertedIndexSearchPredicate(IBinaryTokenizer queryTokenizer, IInvertedIndexSearchModifier searchModifier) {
         this.queryTokenizer = queryTokenizer;
         this.searchModifier = searchModifier;
+        this.isFullTextSearchQuery = false;
     }
 
     public InvertedIndexSearchPredicate(IBinaryTokenizer queryTokenizer, IInvertedIndexSearchModifier searchModifier,
-            ITupleReference minFilterTuple, ITupleReference maxFilterTuple) {
+            ITupleReference minFilterTuple, ITupleReference maxFilterTuple, boolean isFullTextSearchQuery) {
         super(minFilterTuple, maxFilterTuple);
         this.queryTokenizer = queryTokenizer;
         this.searchModifier = searchModifier;
+        this.isFullTextSearchQuery = isFullTextSearchQuery;
     }
 
     public void setQueryTuple(ITupleReference queryTuple) {
@@ -53,6 +58,14 @@ public class InvertedIndexSearchPredicate extends AbstractSearchPredicate {
         return queryTuple;
     }
 
+    public void setIsFullTextSearchQuery(boolean isFullTextSearchQuery) {
+        this.isFullTextSearchQuery = isFullTextSearchQuery;
+    }
+
+    public boolean getIsFullTextSearchQuery() {
+        return isFullTextSearchQuery;
+    }
+
     public void setQueryFieldIndex(int queryFieldIndex) {
         this.queryFieldIndex = queryFieldIndex;
     }

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/DelimitedUTF8StringBinaryTokenizer.java
----------------------------------------------------------------------
diff --git a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/DelimitedUTF8StringBinaryTokenizer.java b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/DelimitedUTF8StringBinaryTokenizer.java
index 32e930d..cd37ffa 100644
--- a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/DelimitedUTF8StringBinaryTokenizer.java
+++ b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/DelimitedUTF8StringBinaryTokenizer.java
@@ -19,6 +19,7 @@
 
 package org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers;
 
+import org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.TokenizerInfo.TokenizerType;
 import org.apache.hyracks.util.string.UTF8StringUtil;
 
 public class DelimitedUTF8StringBinaryTokenizer extends AbstractUTF8StringBinaryTokenizer {
@@ -113,4 +114,9 @@ public class DelimitedUTF8StringBinaryTokenizer extends AbstractUTF8StringBinary
         }
         return tokenCount;
     }
+
+    @Override
+    public TokenizerType getTokenizerType() {
+        return TokenizerType.STRING;
+    }
 }

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/IBinaryTokenizer.java
----------------------------------------------------------------------
diff --git a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/IBinaryTokenizer.java b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/IBinaryTokenizer.java
index ba384c0..6a7da02 100644
--- a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/IBinaryTokenizer.java
+++ b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/IBinaryTokenizer.java
@@ -19,6 +19,8 @@
 
 package org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers;
 
+import org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.TokenizerInfo.TokenizerType;
+
 public interface IBinaryTokenizer {
     public IToken getToken();
 
@@ -30,4 +32,7 @@ public interface IBinaryTokenizer {
 
     // Get the total number of tokens
     public short getTokensCount();
+
+    // Get the tokenizer types
+    public TokenizerType getTokenizerType();
 }

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/NGramUTF8StringBinaryTokenizer.java
----------------------------------------------------------------------
diff --git a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/NGramUTF8StringBinaryTokenizer.java b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/NGramUTF8StringBinaryTokenizer.java
index 9161a54..4c486c5 100644
--- a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/NGramUTF8StringBinaryTokenizer.java
+++ b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/NGramUTF8StringBinaryTokenizer.java
@@ -19,6 +19,7 @@
 
 package org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers;
 
+import org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.TokenizerInfo.TokenizerType;
 import org.apache.hyracks.util.string.UTF8StringUtil;
 
 public class NGramUTF8StringBinaryTokenizer extends AbstractUTF8StringBinaryTokenizer {
@@ -125,4 +126,9 @@ public class NGramUTF8StringBinaryTokenizer extends AbstractUTF8StringBinaryToke
     public short getTokensCount() {
         return (short) totalGrams;
     }
+
+    @Override
+    public TokenizerType getTokenizerType() {
+        return TokenizerType.STRING;
+    }
 }

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c49405aa/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/TokenizerInfo.java
----------------------------------------------------------------------
diff --git a/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/TokenizerInfo.java b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/TokenizerInfo.java
new file mode 100644
index 0000000..c980f1a
--- /dev/null
+++ b/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-invertedindex/src/main/java/org/apache/hyracks/storage/am/lsm/invertedindex/tokenizers/TokenizerInfo.java
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers;
+
+public class TokenizerInfo {
+
+    // Defines the type of a tokenizer.
+    // STRING: tokenizer deals with a string - extract a partial string when next() is called.
+    // LIST: tokenizer deals with a list - extract an element when next() is called.
+    public enum TokenizerType {
+        STRING,
+        LIST
+    }
+
+    private TokenizerInfo() {
+        // No method yet
+    }
+
+}


Mime
View raw message