asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Murtadha Hubail <hubail...@gmail.com>
Subject Re: Help! Any idea to stop AsterixDB from recovering?
Date Sat, 21 May 2016 22:14:23 GMT
2MB is too small for memory components (it will result in many merges if you aren't using no
merge policy). You need to set your checkpoint threshold (txn.log.checkpoint.lsnthreshold)
to a small size as well if you haven't. It has a default value of 64MB. In your case, this
will result in a large number of log records being unnecessarily spilled during recovery analysis
phase, which will also slow down the REDO phase.

I will also investigate the memory swapping to make sure that there isn’t unexpected behavior.

-Murtadha

> On May 20, 2016, at 6:02 PM, Jianfeng Jia <jianfeng.jia@gmail.com> wrote:
> 
> The memory component size is very small: 16 pages. 16* 128K = 2M
> The checkpoint poll frequency is 120s
> 
> Are these files you are talking about? 
> jianfeng~/.asterix$ l nc1/txnLogs/recovery_temp/33/
> total 57M
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_49447149112
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_49502047857
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_49556805925
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_49611565445
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_49666469761
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_49721493802
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_49776490438
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_49831651500
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_49886809324
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_49941694010
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_49996847434
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_50051741552
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_50106633253
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_50161512493
> -rw-r--r-- 1 root 721K May 20 16:38 33_50171087795
> 
> The memory usage is between 800M to 1.3g while the JVM is configured 3G. 
> 
> I will try the new master later.
> My instance is based on 
> commit 1defc92ae43cb36e1c06eb884c5359d5c1d6b0e5
> Date:   Thu May 12 18:31:47 2016 -0700
> 
> 
>> On May 20, 2016, at 5:22 PM, Murtadha Hubail <hubailmor@gmail.com <mailto:hubailmor@gmail.com>>
wrote:
>> 
>> How large is the memory component size and the checkpoint interval? If you have a
large data size to be recovered, there will be many small recovery partitions files written
to disk under the logs directory. You might want to check how many files/directories are there.
Also, if the JVM memory is almost fully consumed, every time before a recovery partition is
loaded to memory, any other data partition already in memory will be removed and this will
cause it to be reloaded again when it is needed. This might cause this busy behavior.
>> 
>> Also, if you are on the current master, I merged a patch earlier this week to make
the spilled partition size configurable. You might want to increase it and see if the recovery
time improves.
>> 
>> Cheers,
>> Murtadha
>> 
>>> On May 20, 2016, at 5:00 PM, Jianfeng Jia <jianfeng.jia@gmail.com <mailto:jianfeng.jia@gmail.com>>
wrote:
>>> 
>>> I am able to reproduce the issue. The thread dump is pasted here: http://pastebin.com/KKvLVVg9
<http://pastebin.com/KKvLVVg9> <http://pastebin.com/KKvLVVg9 <http://pastebin.com/KKvLVVg9>>
<http://pastebin.com/KKvLVVg9 <http://pastebin.com/KKvLVVg9> <http://pastebin.com/KKvLVVg9
<http://pastebin.com/KKvLVVg9>>>
>>> It looks pretty busy with loadPartitionToMemory.
>>> 
>>> "main" #1 prio=5 os_prio=0 tid=0x00007f33f400d800 nid=0x14 runnable [0x00007f33fc27a000]
>>> java.lang.Thread.State: RUNNABLE
>>>  at java.io.FileInputStream.read(Native Method)
>>>  at org.apache.asterix.transaction.management.service.recovery.RecoveryManager$JobEntityCommits.loadPartitionToMemory(RecoveryManager.java:1033)
>>>  at org.apache.asterix.transaction.management.service.recovery.RecoveryManager$JobEntityCommits.serachPartition(RecoveryManager.java:983)
>>>  at org.apache.asterix.transaction.management.service.recovery.RecoveryManager$JobEntityCommits.containsEntityCommitForTxnId(RecoveryManager.java:947)
>>>  at org.apache.asterix.transaction.management.service.recovery.RecoveryManager.startRecoveryRedoPhase(RecoveryManager.java:334)
>>>  - locked <0x00000006c03dc1c0> (a org.apache.asterix.transaction.management.service.recovery.RecoveryManager)
>>>  at org.apache.asterix.transaction.management.service.recovery.RecoveryManager.replayPartitionsLogs(RecoveryManager.java:203)
>>>  - locked <0x00000006c03dc1c0> (a org.apache.asterix.transaction.management.service.recovery.RecoveryManager)
>>>  at org.apache.asterix.transaction.management.service.recovery.RecoveryManager.startRecovery(RecoveryManager.java:195)
>>>  at org.apache.asterix.hyracks.bootstrap.NCApplicationEntryPoint.start(NCApplicationEntryPoint.java:142)
>>>  at org.apache.hyracks.control.nc.NodeControllerService.startApplication(NodeControllerService.java:324)
>>>  at org.apache.hyracks.control.nc.NodeControllerService.start(NodeControllerService.java:259)
>>>  at org.apache.hyracks.control.nc.NCDriver.main(NCDriver.java:48)
>>> 
>>> 
>>> 
>>>> On May 17, 2016, at 12:37 AM, Jianfeng Jia <jianfeng.jia@gmail.com <mailto:jianfeng.jia@gmail.com>
<mailto:jianfeng.jia@gmail.com <mailto:jianfeng.jia@gmail.com>>> wrote:
>>>> 
>>>> I’m trying to reproduce the issue and will dump it then. 
>>>> 
>>>>> On May 16, 2016, at 10:57 PM, Michael Blow <mblow.apache@gmail.com
<mailto:mblow.apache@gmail.com> <mailto:mblow.apache@gmail.com <mailto:mblow.apache@gmail.com>>
<mailto:mblow.apache@gmail.com <mailto:mblow.apache@gmail.com> <mailto:mblow.apache@gmail.com
<mailto:mblow.apache@gmail.com>>>> wrote:
>>>>> 
>>>>> It would good to get thread dumps if this happens again.
>>>>> On Mon, May 16, 2016 at 10:56 PM Jianfeng Jia <jianfeng.jia@gmail.com
<mailto:jianfeng.jia@gmail.com> <mailto:jianfeng.jia@gmail.com <mailto:jianfeng.jia@gmail.com>>
<mailto:jianfeng.jia@gmail.com <mailto:jianfeng.jia@gmail.com> <mailto:jianfeng.jia@gmail.com
<mailto:jianfeng.jia@gmail.com>>>>
>>>>> wrote:
>>>>> 
>>>>>> I revisited the logs, and luckily it hasn’t been cleared. Here
is part of
>>>>>> the nc1’s log:
>>>>>> 
>>>>>> May 15, 2016 1:04:10 PM
>>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache openFile
>>>>>> INFO: Opening file: 14 in cache:
>>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10
>>>>>> May 15, 2016 1:04:10 PM
>>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache openFile
>>>>>> INFO: Opening file: 13 in cache:
>>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10
>>>>>> May 15, 2016 1:04:10 PM
>>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache createFile
>>>>>> INFO: Creating file:
>>>>>> /nc1/iodevice1/storage/partition_0/hackathon/log_device_idx_log_device/2016-05-15-12-56-48-712_2016-05-15-12-23-31-225_f
>>>>>> in cache: org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10
>>>>>> May 15, 2016 1:04:10 PM
>>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache openFile
>>>>>> INFO: Opening file: 15 in cache:
>>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10
>>>>>> ——————————————————————————————————————
>>>>>> /// I shut down the cluster from here and start the server right
away.
>>>>>> ——————————————————————————————————————
>>>>>> May 15, 2016 1:43:12 PM
>>>>>> org.apache.asterix.transaction.management.service.recovery.RecoveryManager
>>>>>> startRecoveryRedoPhase
>>>>>> INFO: Logs REDO phase completed. Redo logs count: 1197
>>>>>> May 15, 2016 1:43:12 PM org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness
>>>>>> flush
>>>>>> INFO: Started a flush operation for index: LSMBTree
>>>>>> [/nc1/iodevice1/storage/partition_0/Metadata/Dataset_idx_Dataset/]
...
>>>>>> May 15, 2016 1:43:12 PM
>>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache createFile
>>>>>> INFO: Creating file:
>>>>>> /nc1/iodevice1/storage/partition_0/Metadata/Dataset_idx_Dataset/2016-05-15-13-43-12-680_2016-05-15-13-43-12-680_f
>>>>>> in cache: org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10
>>>>>> 
>>>>>> No logs generated in that 43mins. During that time one CPU was exhaustive
>>>>>> and I remember no file was touched or generated in the asterix foler.
Then
>>>>>> it may not be the problem of the buffercache in the recovery phase?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On May 16, 2016, at 9:28 PM, Mike Carey <dtabass@gmail.com
<mailto:dtabass@gmail.com> <mailto:dtabass@gmail.com <mailto:dtabass@gmail.com>>
<mailto:dtabass@gmail.com <mailto:dtabass@gmail.com> <mailto:dtabass@gmail.com
<mailto:dtabass@gmail.com>>>> wrote:
>>>>>>> 
>>>>>>> Agreed and agreed.  But is the spinning on recovery?
>>>>>>> 
>>>>>>> (What's the role of the buffer cache during recovery?)
>>>>>>> 
>>>>>>> 
>>>>>>> On 5/17/16 2:10 AM, Jianfeng Jia wrote:
>>>>>>>> I think the BuffeCache is the core issue, the recovery process
may just
>>>>>> run into the same spin trap where it was stopped.
>>>>>>>> And I create another issue that we should be able to Abort
the task so
>>>>>> that we don’t need to restart the server.
>>>>>>>> 
>>>>>>>>> On May 16, 2016, at 7:24 AM, Michael Blow <mblow.apache@gmail.com
<mailto:mblow.apache@gmail.com> <mailto:mblow.apache@gmail.com <mailto:mblow.apache@gmail.com>>
<mailto:mblow.apache@gmail.com <mailto:mblow.apache@gmail.com> <mailto:mblow.apache@gmail.com
<mailto:mblow.apache@gmail.com>>>>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> This might be related: (ASTERIXDB-1438) BufferCache spins
indefinitely
>>>>>> when
>>>>>>>>> cache is exceeded.
>>>>>>>>> 
>>>>>>>>> https://issues.apache.org/jira/browse/ASTERIXDB-1438
<https://issues.apache.org/jira/browse/ASTERIXDB-1438> <https://issues.apache.org/jira/browse/ASTERIXDB-1438
<https://issues.apache.org/jira/browse/ASTERIXDB-1438>> <https://issues.apache.org/jira/browse/ASTERIXDB-1438
<https://issues.apache.org/jira/browse/ASTERIXDB-1438> <https://issues.apache.org/jira/browse/ASTERIXDB-1438
<https://issues.apache.org/jira/browse/ASTERIXDB-1438>>>
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>>> -MDB
>>>>>>>>> 
>>>>>>>>> On Mon, May 16, 2016 at 1:52 AM Mike Carey <dtabass@gmail.com
<mailto:dtabass@gmail.com> <mailto:dtabass@gmail.com <mailto:dtabass@gmail.com>>>
wrote:
>>>>>>>>> 
>>>>>>>>>> Glad it worked out - can someone also capture the
core issue in
>>>>>> JIRA?  Thx!
>>>>>>>>>> On May 15, 2016 11:40 PM, "Jianfeng Jia" <jianfeng.jia@gmail.com
<mailto:jianfeng.jia@gmail.com> <mailto:jianfeng.jia@gmail.com <mailto:jianfeng.jia@gmail.com>>
<mailto:jianfeng.jia@gmail.com <mailto:jianfeng.jia@gmail.com> <mailto:jianfeng.jia@gmail.com
<mailto:jianfeng.jia@gmail.com>>>>
>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Great! The server is back now. Thanks a lot!
>>>>>>>>>>>> On May 15, 2016, at 2:26 PM, Murtadha Hubail
<hubailmor@gmail.com <mailto:hubailmor@gmail.com> <mailto:hubailmor@gmail.com
<mailto:hubailmor@gmail.com>> <mailto:hubailmor@gmail.com <mailto:hubailmor@gmail.com>
<mailto:hubailmor@gmail.com <mailto:hubailmor@gmail.com>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> You can delete the existing log files and
create new empty ones with
>>>>>>>>>>> incremented  log file number, but it is very
important that you don't
>>>>>>>>>>> delete the checkpoint file.
>>>>>>>>>>>> Of course any data in the old log files will
be lost, but the data
>>>>>>>>>>> already on disk will be available.
>>>>>>>>>>>>> On May 15, 2016, at 1:23 PM, Jianfeng
Jia <jianfeng.jia@gmail.com <mailto:jianfeng.jia@gmail.com> <mailto:jianfeng.jia@gmail.com
<mailto:jianfeng.jia@gmail.com>> <mailto:jianfeng.jia@gmail.com <mailto:jianfeng.jia@gmail.com>
<mailto:jianfeng.jia@gmail.com <mailto:jianfeng.jia@gmail.com>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> We submitted a long running join+insert
query and stop the cluster
>>>>>> to
>>>>>>>>>>> stop running it. However, when it restarted it
ran the recovery
>>>>>> forever,
>>>>>>>>>>>>> the logs shows that it is creating a
lot of buffer cache.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In order to bring the cluster back to
answer the query, is there
>>>>>> any
>>>>>>>>>>> hacking solutions? such as remove the recovery
txnlogs? I’m worried
>>>>>> that
>>>>>>>>>> it
>>>>>>>>>>> will ruin the cluster somehow.
>>>>>>>>>>>>> We are in a contest so any early helps
are really appreciated!
>>>>>> Thanks!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Jianfeng Jia
>>>>>>>>>>>>> PhD Candidate of Computer Science
>>>>>>>>>>>>> University of California, Irvine
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> 
>>>>>>>>>>> Jianfeng Jia
>>>>>>>>>>> PhD Candidate of Computer Science
>>>>>>>>>>> University of California, Irvine
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> 
>>>>>>>> Jianfeng Jia
>>>>>>>> PhD Candidate of Computer Science
>>>>>>>> University of California, Irvine
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Best,
>>>>>> 
>>>>>> Jianfeng Jia
>>>>>> PhD Candidate of Computer Science
>>>>>> University of California, Irvine
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>>> 
>>>> Best,
>>>> 
>>>> Jianfeng Jia
>>>> PhD Candidate of Computer Science
>>>> University of California, Irvine
>>>> 
>>> 
>>> 
>>> 
>>> Best,
>>> 
>>> Jianfeng Jia
>>> PhD Candidate of Computer Science
>>> University of California, Irvine
>> 
> 
> 
> 
> Best,
> 
> Jianfeng Jia
> PhD Candidate of Computer Science
> University of California, Irvine


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message