asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yingyi Bu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ASTERIXDB-1365) word-tokens function gets malformed strings from the inverted index
Date Wed, 23 Mar 2016 20:42:25 GMT
Yingyi Bu created ASTERIXDB-1365:
------------------------------------

             Summary: word-tokens function gets malformed strings from the inverted index
                 Key: ASTERIXDB-1365
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1365
             Project: Apache AsterixDB
          Issue Type: Bug
          Components: Functions - AQL
            Reporter: Yingyi Bu
            Assignee: Taewoo Kim
            Priority: Critical


[~wangsaeu],[~javierjia],

It seems there are two possible causes for the bug:
1. the inverted index generates malformed UTF8 strings;
2. UTF8StringUtil has some issue.

However, since UTF8StringUtil has been widely used elsewhere, it's very possible the issue
is in inverted index. Thus, I assign this to Taewoo. Please re-assign owners if you think
the assignment is not right.


This is the query:
{noformat}
use SocialNetworkData;

select distinct element message.message
from   GleambookMessages as message,
       "word-tokens"(message) as token,
       (
        select distinct element emp.organization
        from GleambookUsers as user,
             user.employment emp
       ) as org
where  org=token
       and message.send_time >= datetime('2000-06-07T12:05:32') and message.send_time <
datetime('2000-06-08T12:05:32');
{noformat}


This is the stack trace:
{noformat}
Caused by: java.lang.IllegalArgumentException
	at org.apache.hyracks.util.string.UTF8StringUtil.charAt(UTF8StringUtil.java:60)
	at org.apache.hyracks.storage.am.lsm.invertedindex.tokenizers.DelimitedUTF8StringBinaryTokenizer.hasNext(DelimitedUTF8StringBinaryTokenizer.java:47)
	at org.apache.asterix.runtime.evaluators.common.WordTokensEvaluator.evaluate(WordTokensEvaluator.java:61)
	at org.apache.asterix.runtime.unnestingfunctions.std.ScanCollectionDescriptor$ScanCollectionUnnestingFunctionFactory$1.init(ScanCollectionDescriptor.java:88)
	at org.apache.hyracks.algebricks.runtime.operators.std.UnnestRuntimeFactory$1.nextFrame(UnnestRuntimeFactory.java:121)
	at org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:93)
	at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.flushAndReset(AbstractOneInputOneOutputOneFramePushRuntime.java:63)
	at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.flushIfNotFailed(AbstractOneInputOneOutputOneFramePushRuntime.java:69)
	at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.close(AbstractOneInputOneOutputOneFramePushRuntime.java:55)
	at org.apache.hyracks.algebricks.runtime.operators.std.StreamSelectRuntimeFactory$1.close(StreamSelectRuntimeFactory.java:125)
	at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.close(AbstractOneInputOneOutputOneFramePushRuntime.java:57)
	at org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.close(AssignRuntimeFactory.java:122)
	at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.close(AbstractOneInputOneOutputOneFramePushRuntime.java:57)
	at org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$2.close(AlgebricksMetaOperatorDescriptor.java:153)
	at org.apache.hyracks.storage.am.common.dataflow.IndexSearchOperatorNodePushable.close(IndexSearchOperatorNodePushable.java:227)
	... 9 more
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message