Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C91EE183B3 for ; Wed, 18 Nov 2015 00:54:11 +0000 (UTC) Received: (qmail 33115 invoked by uid 500); 18 Nov 2015 00:54:11 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 33068 invoked by uid 500); 18 Nov 2015 00:54:11 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 33028 invoked by uid 99); 18 Nov 2015 00:54:11 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Nov 2015 00:54:11 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 0EB6E2C1F74 for ; Wed, 18 Nov 2015 00:54:11 +0000 (UTC) Date: Wed, 18 Nov 2015 00:54:11 +0000 (UTC) From: "Elliott Clark (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-14708) Use copy on write Map for region location cache MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009960#comment-15009960 ] Elliott Clark commented on HBASE-14708: --------------------------------------- If we need to we can always go with a sharded set of arrays. That will mean that each write will only do 1/N of the copies and array holder gets a little more complex. However I don't expect it to be too pressing as even people with very large tables aren't talking to 100k regions from one client. > Use copy on write Map for region location cache > ----------------------------------------------- > > Key: HBASE-14708 > URL: https://issues.apache.org/jira/browse/HBASE-14708 > Project: HBase > Issue Type: Improvement > Components: Client > Affects Versions: 1.1.2 > Reporter: Elliott Clark > Assignee: Elliott Clark > Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14708-v10.patch, HBASE-14708-v11.patch, HBASE-14708-v12.patch, HBASE-14708-v13.patch, HBASE-14708-v15.patch, HBASE-14708-v16.patch, HBASE-14708-v17.patch, HBASE-14708-v2.patch, HBASE-14708-v3.patch, HBASE-14708-v4.patch, HBASE-14708-v5.patch, HBASE-14708-v6.patch, HBASE-14708-v7.patch, HBASE-14708-v8.patch, HBASE-14708-v9.patch, HBASE-14708.patch, anotherbench.zip, anotherbench2.zip, location_cache_times.pdf, result.csv > > > Internally a co-worker profiled their application that was talking to HBase. > 60% of the time was spent in locating a region. This was while the cluster was stable and no regions were moving. > To figure out if there was a faster way to cache region location I wrote up a benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache > This tries to simulate a heavy load on the location cache. > * 24 different threads. > * 2 Deleting location data > * 2 Adding location data > * Using floor to get the result. > To repeat my work just run ./run.sh and it should produce a result.csv > Results: > ConcurrentSkiplistMap is a good middle ground. It's got equal speed for reading and writing. > However most operations will not need to remove or add a region location. There will be potentially several orders of magnitude more reads for cached locations than there will be on clearing the cache. > So I propose a copy on write tree map. -- This message was sent by Atlassian JIRA (v6.3.4#6332)