Mailing-List: contact dev-help@asterixdb.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@asterixdb.incubator.apache.org
MIME-Version: 1.0
References: <4CB6C1F8-D0E8-4C4C-9B8B-42AE6B987AF5@gmail.com>
 <010902BB-7436-49BD-86D7-92C6384EF1B8@gmail.com> <9BE2BEEF-5FAD-4794-9B7A-2400213F855C@gmail.com>
 <CAEEsEsSonhod5SKg5BfYrDCdvZz_+qTWMJfs+v2J4XBvaJ6LbA@mail.gmail.com>
 <CAFUTFeaT0DxeuTiU0bZDEnMtpqVRGMaxkZF4osOWAwkEmqV9Lg@mail.gmail.com>
 <F4A150FF-30FF-47FC-9994-C93427CC136F@gmail.com> <771e4991-d9a5-a4ef-2506-a4d30c79ad9f@gmail.com>
 <C82644EF-4A7E-430E-A1A4-314E5942B8F8@gmail.com>
In-Reply-To: <C82644EF-4A7E-430E-A1A4-314E5942B8F8@gmail.com>
From: Michael Blow <mblow.apache@gmail.com>
Date: Tue, 17 May 2016 05:57:36 +0000
Message-ID: <CAFUTFeaqHdUd+=671hiVC4ZDxrZ=DnTK4PCWt5frq4dmT7ju-w@mail.gmail.com>
Subject: Re: Help! Any idea to stop AsterixDB from recovering?
To: dev@asterixdb.incubator.apache.org
Content-Type: multipart/alternative; boundary=001a114448fc8aa2290533036bbe
archived-at: Tue, 17 May 2016 05:57:51 -0000

--001a114448fc8aa2290533036bbe
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

It would good to get thread dumps if this happens again.
On Mon, May 16, 2016 at 10:56 PM Jianfeng Jia <jianfeng.jia@gmail.com>
wrote:

> I revisited the logs, and luckily it hasn=E2=80=99t been cleared. Here is=
 part of
> the nc1=E2=80=99s log:
>
> May 15, 2016 1:04:10 PM
> org.apache.hyracks.storage.common.buffercache.BufferCache openFile
> INFO: Opening file: 14 in cache:
> org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10
> May 15, 2016 1:04:10 PM
> org.apache.hyracks.storage.common.buffercache.BufferCache openFile
> INFO: Opening file: 13 in cache:
> org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10
> May 15, 2016 1:04:10 PM
> org.apache.hyracks.storage.common.buffercache.BufferCache createFile
> INFO: Creating file:
> /nc1/iodevice1/storage/partition_0/hackathon/log_device_idx_log_device/20=
16-05-15-12-56-48-712_2016-05-15-12-23-31-225_f
> in cache: org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1=
f10
> May 15, 2016 1:04:10 PM
> org.apache.hyracks.storage.common.buffercache.BufferCache openFile
> INFO: Opening file: 15 in cache:
> org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10
> =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=
=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=
=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=
=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=
=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94
> /// I shut down the cluster from here and start the server right away.
> =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=
=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=
=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=
=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=
=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94
> May 15, 2016 1:43:12 PM
> org.apache.asterix.transaction.management.service.recovery.RecoveryManage=
r
> startRecoveryRedoPhase
> INFO: Logs REDO phase completed. Redo logs count: 1197
> May 15, 2016 1:43:12 PM org.apache.hyracks.storage.am.lsm.common.impls.LS=
MHarness
> flush
> INFO: Started a flush operation for index: LSMBTree
> [/nc1/iodevice1/storage/partition_0/Metadata/Dataset_idx_Dataset/] ...
> May 15, 2016 1:43:12 PM
> org.apache.hyracks.storage.common.buffercache.BufferCache createFile
> INFO: Creating file:
> /nc1/iodevice1/storage/partition_0/Metadata/Dataset_idx_Dataset/2016-05-1=
5-13-43-12-680_2016-05-15-13-43-12-680_f
> in cache: org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1=
f10
>
> No logs generated in that 43mins. During that time one CPU was exhaustive
> and I remember no file was touched or generated in the asterix foler. The=
n
> it may not be the problem of the buffercache in the recovery phase?
>
>
>
>
>
> > On May 16, 2016, at 9:28 PM, Mike Carey <dtabass@gmail.com> wrote:
> >
> > Agreed and agreed.  But is the spinning on recovery?
> >
> > (What's the role of the buffer cache during recovery?)
> >
> >
> > On 5/17/16 2:10 AM, Jianfeng Jia wrote:
> >> I think the BuffeCache is the core issue, the recovery process may jus=
t
> run into the same spin trap where it was stopped.
> >> And I create another issue that we should be able to Abort the task so
> that we don=E2=80=99t need to restart the server.
> >>
> >>> On May 16, 2016, at 7:24 AM, Michael Blow <mblow.apache@gmail.com>
> wrote:
> >>>
> >>> This might be related: (ASTERIXDB-1438) BufferCache spins indefinitel=
y
> when
> >>> cache is exceeded.
> >>>
> >>> https://issues.apache.org/jira/browse/ASTERIXDB-1438
> >>>
> >>> Thanks,
> >>>
> >>> -MDB
> >>>
> >>> On Mon, May 16, 2016 at 1:52 AM Mike Carey <dtabass@gmail.com> wrote:
> >>>
> >>>> Glad it worked out - can someone also capture the core issue in
> JIRA?  Thx!
> >>>> On May 15, 2016 11:40 PM, "Jianfeng Jia" <jianfeng.jia@gmail.com>
> wrote:
> >>>>
> >>>>> Great! The server is back now. Thanks a lot!
> >>>>>> On May 15, 2016, at 2:26 PM, Murtadha Hubail <hubailmor@gmail.com>
> >>>>> wrote:
> >>>>>> You can delete the existing log files and create new empty ones wi=
th
> >>>>> incremented  log file number, but it is very important that you don=
't
> >>>>> delete the checkpoint file.
> >>>>>> Of course any data in the old log files will be lost, but the data
> >>>>> already on disk will be available.
> >>>>>>> On May 15, 2016, at 1:23 PM, Jianfeng Jia <jianfeng.jia@gmail.com=
>
> >>>>> wrote:
> >>>>>>> Hi,
> >>>>>>> We submitted a long running join+insert query and stop the cluste=
r
> to
> >>>>> stop running it. However, when it restarted it ran the recovery
> forever,
> >>>>>>> the logs shows that it is creating a lot of buffer cache.
> >>>>>>>
> >>>>>>> In order to bring the cluster back to answer the query, is there
> any
> >>>>> hacking solutions? such as remove the recovery txnlogs? I=E2=80=99m=
 worried
> that
> >>>> it
> >>>>> will ruin the cluster somehow.
> >>>>>>> We are in a contest so any early helps are really appreciated!
> Thanks!
> >>>>>>>
> >>>>>>>
> >>>>>>> Best,
> >>>>>>>
> >>>>>>> Jianfeng Jia
> >>>>>>> PhD Candidate of Computer Science
> >>>>>>> University of California, Irvine
> >>>>>>>
> >>>>>
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Jianfeng Jia
> >>>>> PhD Candidate of Computer Science
> >>>>> University of California, Irvine
> >>>>>
> >>>>>
> >>
> >>
> >> Best,
> >>
> >> Jianfeng Jia
> >> PhD Candidate of Computer Science
> >> University of California, Irvine
> >>
> >>
> >
>
>
>
> Best,
>
> Jianfeng Jia
> PhD Candidate of Computer Science
> University of California, Irvine
>
>

--001a114448fc8aa2290533036bbe--