Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 35A6E2009C6 for ; Tue, 17 May 2016 07:57:51 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 34565160A19; Tue, 17 May 2016 05:57:51 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 53277160A16 for ; Tue, 17 May 2016 07:57:50 +0200 (CEST) Received: (qmail 38914 invoked by uid 500); 17 May 2016 05:57:49 -0000 Mailing-List: contact dev-help@asterixdb.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@asterixdb.incubator.apache.org Delivered-To: mailing list dev@asterixdb.incubator.apache.org Received: (qmail 38897 invoked by uid 99); 17 May 2016 05:57:49 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 May 2016 05:57:49 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id CC8A71A074C for ; Tue, 17 May 2016 05:57:48 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.458 X-Spam-Level: * X-Spam-Status: No, score=1.458 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, PLING_QUERY=0.279, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id lD1UjqqQhBnF for ; Tue, 17 May 2016 05:57:46 +0000 (UTC) Received: from mail-wm0-f46.google.com (mail-wm0-f46.google.com [74.125.82.46]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 1B5995F1F5 for ; Tue, 17 May 2016 05:57:46 +0000 (UTC) Received: by mail-wm0-f46.google.com with SMTP id a17so11352332wme.0 for ; Mon, 16 May 2016 22:57:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=j9A6cnqwJxUH/qEa4lL9S4d75H+EGda5deF5dP/GCDM=; b=cFRRBKeo10MEc5qMeWwdfv80RbAuNVFCW73P8SKuiWjiNKWMiCbVxjHJjwI2Tg9bg4 7yJNBZm7sAcKkYpq2TSvibxbvZgk2eoAbdTYgHk+UXq/iw2zgohppPqJrncpPvbUs2kJ AFB8yhNsIyG0M94iPMyO4lPhvcuPLRLyXV6V/hgd7GTsjeSiTnLOj7ChbhuxgDj3FfKL TAzHJJv5rGIpZ66RBv+WvmnJD4hhhAoAnqVkVMx+GGFzVvqE6D3o0NjYrfgNiL0PBjJN KXRgrUEXtouLBiUPmx0agB0Wx+uOEpNNi2gsolRr2Rc2jAWYPIdmNGVU1lZvZtibbkdr Bm2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=j9A6cnqwJxUH/qEa4lL9S4d75H+EGda5deF5dP/GCDM=; b=mrjqv9hy9+7CZnWkQ6OoH2H7SnmKb2Kgy2pPAfaBcKOiKRV1OYPs5HGNnfJTEEX+HJ VNhlwHiHOPHPSAnChWVspeSUbtqSo5B7BUcHSTgnbiP8LnQKDLQY1Q4Fa040ZQUlx0o7 dBA3aZOxtAwQZhaX0KD4T7/kwQKGRFpZzB5r24X4wu1cpfqg4ceHosfrQN4lBiqcnPbA aszcM4xctyM09WFDHkXbUYUEmbyNd2IL2+gFEDg8/yOCYMD+kIP4+WyeczhgWR76L1CD LppS7MU0JlEQvsChuxNEQiJrxiM73Gwx/kHcvTNqqdmjD2ogUtp4z4EM99j4jKI+Waf6 o3EA== X-Gm-Message-State: AOPr4FUvTl67BqKON/JrVM9mOk3o6TaiJTZV3CBytKzCzLR8BxECbZnscNLgKJva5p1sfN9CnXEEZbGUObUh1w== X-Received: by 10.28.139.137 with SMTP id n131mr21272300wmd.13.1463464665717; Mon, 16 May 2016 22:57:45 -0700 (PDT) MIME-Version: 1.0 References: <4CB6C1F8-D0E8-4C4C-9B8B-42AE6B987AF5@gmail.com> <010902BB-7436-49BD-86D7-92C6384EF1B8@gmail.com> <9BE2BEEF-5FAD-4794-9B7A-2400213F855C@gmail.com> <771e4991-d9a5-a4ef-2506-a4d30c79ad9f@gmail.com> In-Reply-To: From: Michael Blow Date: Tue, 17 May 2016 05:57:36 +0000 Message-ID: Subject: Re: Help! Any idea to stop AsterixDB from recovering? To: dev@asterixdb.incubator.apache.org Content-Type: multipart/alternative; boundary=001a114448fc8aa2290533036bbe archived-at: Tue, 17 May 2016 05:57:51 -0000 --001a114448fc8aa2290533036bbe Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable It would good to get thread dumps if this happens again. On Mon, May 16, 2016 at 10:56 PM Jianfeng Jia wrote: > I revisited the logs, and luckily it hasn=E2=80=99t been cleared. Here is= part of > the nc1=E2=80=99s log: > > May 15, 2016 1:04:10 PM > org.apache.hyracks.storage.common.buffercache.BufferCache openFile > INFO: Opening file: 14 in cache: > org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10 > May 15, 2016 1:04:10 PM > org.apache.hyracks.storage.common.buffercache.BufferCache openFile > INFO: Opening file: 13 in cache: > org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10 > May 15, 2016 1:04:10 PM > org.apache.hyracks.storage.common.buffercache.BufferCache createFile > INFO: Creating file: > /nc1/iodevice1/storage/partition_0/hackathon/log_device_idx_log_device/20= 16-05-15-12-56-48-712_2016-05-15-12-23-31-225_f > in cache: org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1= f10 > May 15, 2016 1:04:10 PM > org.apache.hyracks.storage.common.buffercache.BufferCache openFile > INFO: Opening file: 15 in cache: > org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10 > =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80= =94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94 > /// I shut down the cluster from here and start the server right away. > =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80= =94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94 > May 15, 2016 1:43:12 PM > org.apache.asterix.transaction.management.service.recovery.RecoveryManage= r > startRecoveryRedoPhase > INFO: Logs REDO phase completed. Redo logs count: 1197 > May 15, 2016 1:43:12 PM org.apache.hyracks.storage.am.lsm.common.impls.LS= MHarness > flush > INFO: Started a flush operation for index: LSMBTree > [/nc1/iodevice1/storage/partition_0/Metadata/Dataset_idx_Dataset/] ... > May 15, 2016 1:43:12 PM > org.apache.hyracks.storage.common.buffercache.BufferCache createFile > INFO: Creating file: > /nc1/iodevice1/storage/partition_0/Metadata/Dataset_idx_Dataset/2016-05-1= 5-13-43-12-680_2016-05-15-13-43-12-680_f > in cache: org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1= f10 > > No logs generated in that 43mins. During that time one CPU was exhaustive > and I remember no file was touched or generated in the asterix foler. The= n > it may not be the problem of the buffercache in the recovery phase? > > > > > > > On May 16, 2016, at 9:28 PM, Mike Carey wrote: > > > > Agreed and agreed. But is the spinning on recovery? > > > > (What's the role of the buffer cache during recovery?) > > > > > > On 5/17/16 2:10 AM, Jianfeng Jia wrote: > >> I think the BuffeCache is the core issue, the recovery process may jus= t > run into the same spin trap where it was stopped. > >> And I create another issue that we should be able to Abort the task so > that we don=E2=80=99t need to restart the server. > >> > >>> On May 16, 2016, at 7:24 AM, Michael Blow > wrote: > >>> > >>> This might be related: (ASTERIXDB-1438) BufferCache spins indefinitel= y > when > >>> cache is exceeded. > >>> > >>> https://issues.apache.org/jira/browse/ASTERIXDB-1438 > >>> > >>> Thanks, > >>> > >>> -MDB > >>> > >>> On Mon, May 16, 2016 at 1:52 AM Mike Carey wrote: > >>> > >>>> Glad it worked out - can someone also capture the core issue in > JIRA? Thx! > >>>> On May 15, 2016 11:40 PM, "Jianfeng Jia" > wrote: > >>>> > >>>>> Great! The server is back now. Thanks a lot! > >>>>>> On May 15, 2016, at 2:26 PM, Murtadha Hubail > >>>>> wrote: > >>>>>> You can delete the existing log files and create new empty ones wi= th > >>>>> incremented log file number, but it is very important that you don= 't > >>>>> delete the checkpoint file. > >>>>>> Of course any data in the old log files will be lost, but the data > >>>>> already on disk will be available. > >>>>>>> On May 15, 2016, at 1:23 PM, Jianfeng Jia > >>>>> wrote: > >>>>>>> Hi, > >>>>>>> We submitted a long running join+insert query and stop the cluste= r > to > >>>>> stop running it. However, when it restarted it ran the recovery > forever, > >>>>>>> the logs shows that it is creating a lot of buffer cache. > >>>>>>> > >>>>>>> In order to bring the cluster back to answer the query, is there > any > >>>>> hacking solutions? such as remove the recovery txnlogs? I=E2=80=99m= worried > that > >>>> it > >>>>> will ruin the cluster somehow. > >>>>>>> We are in a contest so any early helps are really appreciated! > Thanks! > >>>>>>> > >>>>>>> > >>>>>>> Best, > >>>>>>> > >>>>>>> Jianfeng Jia > >>>>>>> PhD Candidate of Computer Science > >>>>>>> University of California, Irvine > >>>>>>> > >>>>> > >>>>> > >>>>> Best, > >>>>> > >>>>> Jianfeng Jia > >>>>> PhD Candidate of Computer Science > >>>>> University of California, Irvine > >>>>> > >>>>> > >> > >> > >> Best, > >> > >> Jianfeng Jia > >> PhD Candidate of Computer Science > >> University of California, Irvine > >> > >> > > > > > > Best, > > Jianfeng Jia > PhD Candidate of Computer Science > University of California, Irvine > > --001a114448fc8aa2290533036bbe--