Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 99B3618B6F for ; Tue, 22 Sep 2015 04:31:07 +0000 (UTC) Received: (qmail 88467 invoked by uid 500); 22 Sep 2015 04:31:04 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 88407 invoked by uid 500); 22 Sep 2015 04:31:04 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 88388 invoked by uid 99); 22 Sep 2015 04:31:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Sep 2015 04:31:04 +0000 Date: Tue, 22 Sep 2015 04:31:04 +0000 (UTC) From: "Shiwei Guo (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (YARN-4199) Minimize lock time in LeveldbTimelineStore.discardOldEntities MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Shiwei Guo created YARN-4199: -------------------------------- Summary: Minimize lock time in LeveldbTimelineStore.discardOldEntities Key: YARN-4199 URL: https://issues.apache.org/jira/browse/YARN-4199 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver, yarn Reporter: Shiwei Guo I current implementation, LeveldbTimelineStore.discardOldEntities holds a writeLock on deleteLock, which will block other put operation, which eventually block the execution of YARN jobs(e.g. TEZ). When there is lots of history jobs in timelinestore, the block time will be very long. In our observation, it block all the TEZ jobs for several hours or longer. The possible solutions are: - Optimize leveldb configuration, so a full scan won't take long time. - Take a snapshot of leveldb, and scan the snapshot, so we only need to hold lock while getSnapshot. One question is that whether snapshot will take long time or not, cause I have no experience with leveldb. -- This message was sent by Atlassian JIRA (v6.3.4#6332)