From user-return-56069-archive-asf-public=cust-asf.ponee.io@hbase.apache.org Tue Jul 23 07:55:18 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id BF6A81802C7 for ; Tue, 23 Jul 2019 09:55:17 +0200 (CEST) Received: (qmail 84069 invoked by uid 500); 23 Jul 2019 07:55:15 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 84056 invoked by uid 99); 23 Jul 2019 07:55:14 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Jul 2019 07:55:14 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 6680C1812D9 for ; Tue, 23 Jul 2019 07:55:14 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.798 X-Spam-Level: * X-Spam-Status: No, score=1.798 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=hotmail.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id TmwQsD8Mkww1 for ; Tue, 23 Jul 2019 07:55:07 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a01:111:f400:fe4a::80c; helo=nam03-by2-obe.outbound.protection.outlook.com; envelope-from=arwin.tio@hotmail.com; receiver= Received: from NAM03-BY2-obe.outbound.protection.outlook.com (mail-by2nam03olkn080c.outbound.protection.outlook.com [IPv6:2a01:111:f400:fe4a::80c]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id A70C77DC5D for ; Tue, 23 Jul 2019 07:55:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hotmail.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=PrJm7zPoFEL1Rtw8W8dx0CTbqbhfvf6EaFemUvTqPA0=; b=R4nlKRd4GKMSH8S3d2kVdfoWGj1PHjzJ8PO2So4cAieEKgHPWYYkDoRvf48hVUfpWJJ0dO1grFa/WjeOSYdkFSDq2RPBocQWdHEMlhCSsEQfKEwyBnJe7VJmjp6vcqStUg9gwjoEKPDXB9QZQw3vvwv8G7hEQXeLixyi70kBhPcHt8/NY+orDFDtevSXcuJ/pSuHpcKLxh4q2KLol7lzuYaVoYljGu7WediYsg/xM6nngHnvGH5APxT0KAdBSjKiTETJuY2QSgfUj2UJEPXoBIBofns/E8JnrbV2a27oXJrhm5xTfvos3UiIzTG2YpdrkqDyWkTpGFo+rRLqn4jJZg== Received: from CO1NAM03FT030.eop-NAM03.prod.protection.outlook.com (10.152.80.51) by CO1NAM03HT142.eop-NAM03.prod.protection.outlook.com (10.152.81.28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.2052.18; Tue, 23 Jul 2019 06:56:31 +0000 Received: from MWHPR05MB3599.namprd05.prod.outlook.com (10.152.80.59) by CO1NAM03FT030.mail.protection.outlook.com (10.152.80.169) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.2052.18 via Frontend Transport; Tue, 23 Jul 2019 06:56:31 +0000 Received: from MWHPR05MB3599.namprd05.prod.outlook.com ([fe80::f582:cb33:7902:bc5a]) by MWHPR05MB3599.namprd05.prod.outlook.com ([fe80::f582:cb33:7902:bc5a%7]) with mapi id 15.20.2115.005; Tue, 23 Jul 2019 06:56:31 +0000 From: Arwin Tio To: "user@hbase.apache.org" Subject: TimeoutException on Snapshots Thread-Topic: TimeoutException on Snapshots Thread-Index: AQHVQSHYnU3eS7DzG0Gu1RZl94tU5g== Date: Tue, 23 Jul 2019 06:56:31 +0000 Message-ID: Accept-Language: en-CA, en-US Content-Language: en-CA X-MS-Has-Attach: X-MS-TNEF-Correlator: x-incomingtopheadermarker: OriginalChecksum:AD1FA85EBF592A0E731D67DA31BC7944ECBCAC295E10610F957D3101E6F0A0EA;UpperCasedChecksum:10CE5BE9C1923BF88564D6B325FD8AD096CBCC6A912799B7CC79E00E32E4BC3B;SizeAsReceived:6517;Count:40 x-tmn: [Dn5MTbCTCzQhN10czQyLM+/2Fc9rTMj8] x-ms-publictraffictype: Email x-incomingheadercount: 40 x-eopattributedmessage: 0 x-microsoft-antispam: BCL:0;PCL:0;RULEID:(2390118)(5050001)(7020095)(20181119110)(201702061078)(5061506573)(5061507331)(1603103135)(2017031320274)(2017031322404)(2017031323274)(2017031324274)(1601125500)(1603101475)(1701031045);SRVR:CO1NAM03HT142; x-ms-traffictypediagnostic: CO1NAM03HT142: x-ms-exchange-purlcount: 3 x-microsoft-antispam-message-info: +usGfOu4ndFCIgh5udQf3r7Iu1jiskCXWw+7J9qezsq3MWENgazsoDQVBbSUgQyFcPNV2Vb7nE+TWxjGai2c0X4AU0SZybC7+2IaaqR4LwoB/M+KuH2HbZXCZUgRQ1ci5tPxt/Vba5gE/9cwCtBjFXbI7lPftsVOG+IsfgfTj+VNfYT8KNh61/Ularsbk+wn Content-Type: multipart/alternative; boundary="_000_MWHPR05MB3599F550967A8CAE5A28D873F0C70MWHPR05MB3599namp_" MIME-Version: 1.0 X-OriginatorOrg: hotmail.com X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-CrossTenant-Network-Message-Id: 7e109e4f-f73a-4254-6363-08d70f3ae4bc X-MS-Exchange-CrossTenant-rms-persistedconsumerorg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-CrossTenant-originalarrivaltime: 23 Jul 2019 06:56:31.5216 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: CO1NAM03HT142 --_000_MWHPR05MB3599F550967A8CAE5A28D873F0C70MWHPR05MB3599namp_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi all, I've been running into these issues after restoring from snapshots: https://issues.apache.org/jira/browse/HBASE-16464 https://issues.apache.org/jira/browse/HBASE-17992 Essentially, HRegion#addRegionToSnapshot has been timing out in TakeSnapsho= tHandler, resulting in some leftover tmp files. The leftover tmp files caus= es archivedHFileCleaner, which manifests in an extremely large archive fold= er that doesn't get cleaned up. HBASE-16464 solves the bloating archive folder by preventing the SnapshotRe= gionManifest from being written if the operation has timed out (see: https:= //github.com/apache/hbase/commit/ab011391ab392f1a62b6ea9bdca87fc950af42a9#d= iff-4ec74c1b12f2be4f52c33260fd8b73efR86) My question is: is it safe to ignore these TimeoutExceptions? if the Snapsh= otRegionManifests are not being written due to a timeout does that mean we = are losing data or getting inconsistencies? If so, what are some potential remedies for this? I'm thinking we can just = increase the timeout 'hbase.snapshot.master.timeout.millis' but is there a = better way? Thanks --_000_MWHPR05MB3599F550967A8CAE5A28D873F0C70MWHPR05MB3599namp_--