asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianfeng Jia <jianfeng....@gmail.com>
Subject Re: Help! Any idea to stop AsterixDB from recovering?
Date Tue, 17 May 2016 05:56:15 GMT
I revisited the logs, and luckily it hasn’t been cleared. Here is part of the nc1’s log:

May 15, 2016 1:04:10 PM org.apache.hyracks.storage.common.buffercache.BufferCache openFile
INFO: Opening file: 14 in cache: org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10
May 15, 2016 1:04:10 PM org.apache.hyracks.storage.common.buffercache.BufferCache openFile
INFO: Opening file: 13 in cache: org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10
May 15, 2016 1:04:10 PM org.apache.hyracks.storage.common.buffercache.BufferCache createFile
INFO: Creating file: /nc1/iodevice1/storage/partition_0/hackathon/log_device_idx_log_device/2016-05-15-12-56-48-712_2016-05-15-12-23-31-225_f
in cache: org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10
May 15, 2016 1:04:10 PM org.apache.hyracks.storage.common.buffercache.BufferCache openFile
INFO: Opening file: 15 in cache: org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10
——————————————————————————————————————
/// I shut down the cluster from here and start the server right away.
——————————————————————————————————————
May 15, 2016 1:43:12 PM org.apache.asterix.transaction.management.service.recovery.RecoveryManager
startRecoveryRedoPhase
INFO: Logs REDO phase completed. Redo logs count: 1197
May 15, 2016 1:43:12 PM org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness flush
INFO: Started a flush operation for index: LSMBTree [/nc1/iodevice1/storage/partition_0/Metadata/Dataset_idx_Dataset/]
...
May 15, 2016 1:43:12 PM org.apache.hyracks.storage.common.buffercache.BufferCache createFile
INFO: Creating file: /nc1/iodevice1/storage/partition_0/Metadata/Dataset_idx_Dataset/2016-05-15-13-43-12-680_2016-05-15-13-43-12-680_f
in cache: org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10

No logs generated in that 43mins. During that time one CPU was exhaustive and I remember no
file was touched or generated in the asterix foler. Then it may not be the problem of the
buffercache in the recovery phase? 





> On May 16, 2016, at 9:28 PM, Mike Carey <dtabass@gmail.com> wrote:
> 
> Agreed and agreed.  But is the spinning on recovery?
> 
> (What's the role of the buffer cache during recovery?)
> 
> 
> On 5/17/16 2:10 AM, Jianfeng Jia wrote:
>> I think the BuffeCache is the core issue, the recovery process may just run into
the same spin trap where it was stopped.
>> And I create another issue that we should be able to Abort the task so that we don’t
need to restart the server.
>> 
>>> On May 16, 2016, at 7:24 AM, Michael Blow <mblow.apache@gmail.com> wrote:
>>> 
>>> This might be related: (ASTERIXDB-1438) BufferCache spins indefinitely when
>>> cache is exceeded.
>>> 
>>> https://issues.apache.org/jira/browse/ASTERIXDB-1438
>>> 
>>> Thanks,
>>> 
>>> -MDB
>>> 
>>> On Mon, May 16, 2016 at 1:52 AM Mike Carey <dtabass@gmail.com> wrote:
>>> 
>>>> Glad it worked out - can someone also capture the core issue in JIRA?  Thx!
>>>> On May 15, 2016 11:40 PM, "Jianfeng Jia" <jianfeng.jia@gmail.com> wrote:
>>>> 
>>>>> Great! The server is back now. Thanks a lot!
>>>>>> On May 15, 2016, at 2:26 PM, Murtadha Hubail <hubailmor@gmail.com>
>>>>> wrote:
>>>>>> You can delete the existing log files and create new empty ones with
>>>>> incremented  log file number, but it is very important that you don't
>>>>> delete the checkpoint file.
>>>>>> Of course any data in the old log files will be lost, but the data
>>>>> already on disk will be available.
>>>>>>> On May 15, 2016, at 1:23 PM, Jianfeng Jia <jianfeng.jia@gmail.com>
>>>>> wrote:
>>>>>>> Hi,
>>>>>>> We submitted a long running join+insert query and stop the cluster
to
>>>>> stop running it. However, when it restarted it ran the recovery forever,
>>>>>>> the logs shows that it is creating a lot of buffer cache.
>>>>>>> 
>>>>>>> In order to bring the cluster back to answer the query, is there
any
>>>>> hacking solutions? such as remove the recovery txnlogs? I’m worried
that
>>>> it
>>>>> will ruin the cluster somehow.
>>>>>>> We are in a contest so any early helps are really appreciated!
Thanks!
>>>>>>> 
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>>>>>>> Jianfeng Jia
>>>>>>> PhD Candidate of Computer Science
>>>>>>> University of California, Irvine
>>>>>>> 
>>>>> 
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Jianfeng Jia
>>>>> PhD Candidate of Computer Science
>>>>> University of California, Irvine
>>>>> 
>>>>> 
>> 
>> 
>> Best,
>> 
>> Jianfeng Jia
>> PhD Candidate of Computer Science
>> University of California, Irvine
>> 
>> 
> 



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine


Mime
View raw message