asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ian Maxon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ASTERIXDB-1534) NPE when restart the server
Date Wed, 03 Aug 2016 00:21:20 GMT

    [ https://issues.apache.org/jira/browse/ASTERIXDB-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15405076#comment-15405076
] 

Ian Maxon commented on ASTERIXDB-1534:
--------------------------------------

It's the on-disk one. Interrupts in flushes should be OK, because if they were to happen,
the component is simply discarded and rebuilt from logs. The metadata page is the last page
to be written, and it (contains information that) determines the validity of the component.


I'm going to reload it to see what the dataset looks like on-disk before restart. The instance
I was debugging had it happening only on the very last disk component of one partition. The
issue should hopefully be somewhere in either flush or recovery. 

> NPE when restart the server
> ---------------------------
>
>                 Key: ASTERIXDB-1534
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1534
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: Storage
>         Environment: master 
> commit a89fae64ac21fb8eefde79f79d2dbe1a0e54c364
> Date:   Wed Jul 6 07:58:55 2016 -0700
>            Reporter: Jianfeng Jia
>            Assignee: Ian Maxon
>         Attachments: asterix-configuration.xml, ingest.sh
>
>
> When I stop and start the cluster by managix, I hit the following error:
> {code}
> ERROR: /rhome/jianfeng/managix/home/asterix/cloudberry/.nfs00000000021805340000118e (No
such file or directory)
> j
> {code}
> And no nc and cc got started.
> After a while, I ran the managix start again, the cluster restart successfully. 
> But one of the dataset  can't answer any queries. The simplest select query
> {code}
> for $t in dataset twitter.ds_tweet limit 5 return $t
> {code}
>  will give me the following error:
> {code}
> Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: org.apache.hyracks.api.exceptions.HyracksDataException:
java.util.concurrent.ExecutionException: org.apache.hyracks.api.exceptions.HyracksDataException:
java.lang.NullPointerException
>     at org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:45)
>     at org.apache.hyracks.control.nc.Task.run(Task.java:319)
>     ... 3 more
> Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: java.util.concurrent.ExecutionException:
org.apache.hyracks.api.exceptions.HyracksDataException: java.lang.NullPointerException
>     at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:218)
>     at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.initialize(SuperActivityOperatorNodePushable.java:83)
>     at org.apache.hyracks.control.nc.Task.run(Task.java:263)
>     ... 3 more
> Caused by: java.util.concurrent.ExecutionException: org.apache.hyracks.api.exceptions.HyracksDataException:
java.lang.NullPointerException
>     at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>     at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>     at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:212)
>     ... 5 more
> Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: java.lang.NullPointerException
>     at org.apache.hyracks.storage.am.common.dataflow.IndexSearchOperatorNodePushable.nextFrame(IndexSearchOperatorNodePushable.java:187)
>     at org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:93)
>     at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.flushAndReset(AbstractOneInputOneOutputOneFramePushRuntime.java:63)
>     at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.flushIfNotFailed(AbstractOneInputOneOutputOneFramePushRuntime.java:69)
>     at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.close(AbstractOneInputOneOutputOneFramePushRuntime.java:55)
>     at org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.close(AssignRuntimeFactory.java:122)
>     at org.apache.hyracks.algebricks.runtime.operators.std.EmptyTupleSourceRuntimeFactory$1.close(EmptyTupleSourceRuntimeFactory.java:60)
>     at org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$1.initialize(AlgebricksMetaOperatorDescriptor.java:116)
>     at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$initialize$0(SuperActivityOperatorNodePushable.java:83)
>     at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$$Lambda$1/350086994.runAction(Unknown
Source)
>     at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$1.call(SuperActivityOperatorNodePushable.java:205)
>     at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$1.call(SuperActivityOperatorNodePushable.java:202)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     ... 3 more
> Caused by: java.lang.NullPointerException
>     at org.apache.hyracks.storage.am.common.frames.TreeIndexNSMFrame.getTupleCount(TreeIndexNSMFrame.java:287)
>     at org.apache.hyracks.storage.am.btree.impls.BTreeRangeSearchCursor.hasNext(BTreeRangeSearchCursor.java:141)
>     at org.apache.hyracks.storage.am.lsm.invertedindex.ondisk.PartitionedOnDiskInvertedIndex.openInvertedListPartitionCursors(PartitionedOnDiskInvertedIndex.java:98)
>     at org.apache.hyracks.storage.am.lsm.invertedindex.search.PartitionedTOccurrenceSearcher.search(PartitionedTOccurrenceSearcher.java:116)
>     at org.apache.hyracks.storage.am.lsm.invertedindex.ondisk.OnDiskInvertedIndex$OnDiskInvertedIndexAccessor.search(OnDiskInvertedIndex.java:519)
>     at org.apache.hyracks.storage.am.lsm.invertedindex.impls.LSMInvertedIndexSearchCursor.hasNext(LSMInvertedIndexSearchCursor.java:143)
>     at org.apache.hyracks.storage.am.common.dataflow.IndexSearchOperatorNodePushable.writeSearchResults(IndexSearchOperatorNodePushable.java:149)
>     at org.apache.hyracks.storage.am.common.dataflow.IndexSearchOperatorNodePushable.nextFrame(IndexSearchOperatorNodePushable.java:184)
>     ... 15 more
>                     
> {code}
> One hint is that this dataset was connecting with a feed before the restart. The other
dataset that didn't have feed connection seems working fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message