Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6C9CE17AD5 for ; Sat, 24 Jan 2015 04:26:37 +0000 (UTC) Received: (qmail 37343 invoked by uid 500); 24 Jan 2015 04:26:37 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 37301 invoked by uid 500); 24 Jan 2015 04:26:37 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 37290 invoked by uid 99); 24 Jan 2015 04:26:37 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 24 Jan 2015 04:26:37 +0000 Date: Sat, 24 Jan 2015 04:26:37 +0000 (UTC) From: "Enis Soztutar (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-12728) buffered writes substantially less useful after removal of HTablePool MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-12728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-12728: ---------------------------------- Attachment: hbase-12728-1.0-addendum-3.patch I think the problem is that, because of the retries in branch-1.0 is done 3 times for the region lookup, where each lookup does 3 retries of its own from DFSInputStream causing a timeout on the original scan RPC so causing a SocketTimeout (60 sec) instead of IOException as expected from the test. Notice that we set the retries number in 2 places in the test: {code} // We set it not to run or it will trigger server shutdown while sync'ing // because all the datanodes are bad util.getConfiguration().setInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER, 3); ... util.getConfiguration().setInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER, 1); {code} Notice that the Connection in util is created before the second set, so it gets the retries number from the initial set, which is 3. That is why the test fails in branch-1.0. In branch-1 and master, the connection still have retries of 3, but due to HBASE-12761, the scanner initialization have changed, making the first try throw the exception. Attached patch fixes the problem. We can commit it to all 3 branches (it does not affect branch-1 and master but good to have). > buffered writes substantially less useful after removal of HTablePool > --------------------------------------------------------------------- > > Key: HBASE-12728 > URL: https://issues.apache.org/jira/browse/HBASE-12728 > Project: HBase > Issue Type: Bug > Components: hbase > Affects Versions: 0.98.0 > Reporter: Aaron Beppu > Assignee: Nick Dimiduk > Priority: Blocker > Fix For: 1.0.0, 2.0.0, 1.1.0 > > Attachments: 12728-1.0-addendum-2.txt, 12728.connection-owns-buffers.example.branch-1.0.patch, HBASE-12728-2.patch, HBASE-12728-3.patch, HBASE-12728-4.patch, HBASE-12728-5.patch, HBASE-12728-6.patch, HBASE-12728-6.patch, HBASE-12728.05-branch-1.0.patch, HBASE-12728.05-branch-1.patch, HBASE-12728.06-branch-1.0.patch, HBASE-12728.06-branch-1.patch, HBASE-12728.addendum.patch, HBASE-12728.patch, bulk-mutator.patch, hbase-12728-1.0-addendum-3.patch > > > In previous versions of HBase, when use of HTablePool was encouraged, HTable instances were long-lived in that pool, and for that reason, if autoFlush was set to false, the table instance could accumulate a full buffer of writes before a flush was triggered. Writes from the client to the cluster could then be substantially larger and less frequent than without buffering. > However, when HTablePool was deprecated, the primary justification seems to have been that creating HTable instances is cheap, so long as the connection and executor service being passed to it are pre-provided. A use pattern was encouraged where users should create a new HTable instance for every operation, using an existing connection and executor service, and then close the table. In this pattern, buffered writes are substantially less useful; writes are as small and as frequent as they would have been with autoflush=true, except the synchronous write is moved from the operation itself to the table close call which immediately follows. > More concretely : > ``` > // Given these two helpers ... > private HTableInterface getAutoFlushTable(String tableName) throws IOException { > // (autoflush is true by default) > return storedConnection.getTable(tableName, executorService); > } > private HTableInterface getBufferedTable(String tableName) throws IOException { > HTableInterface table = getAutoFlushTable(tableName); > table.setAutoFlush(false); > return table; > } > // it's my contention that these two methods would behave almost identically, > // except the first will hit a synchronous flush during the put call, > and the second will > // flush during the (hidden) close call on table. > private void writeAutoFlushed(Put somePut) throws IOException { > try (HTableInterface table = getAutoFlushTable(tableName)) { > table.put(somePut); // will do synchronous flush > } > } > private void writeBuffered(Put somePut) throws IOException { > try (HTableInterface table = getBufferedTable(tableName)) { > table.put(somePut); > } // auto-close will trigger synchronous flush > } > ``` > For buffered writes to actually provide a performance benefit to users, one of two things must happen: > - The writeBuffer itself shouldn't live, flush and die with the lifecycle of it's HTableInstance. If the writeBuffer were managed elsewhere and had a long lifespan, this could cease to be an issue. However, if the same writeBuffer is appended to by multiple tables, then some additional concurrency control will be needed around it. > - Alternatively, there should be some pattern for having long-lived HTable instances. However, since HTable is not thread-safe, we'd need multiple instances, and a mechanism for leasing them out safely -- which sure sounds a lot like the old HTablePool to me. > See discussion on mailing list here : http://mail-archives.apache.org/mod_mbox/hbase-user/201412.mbox/%3CCAPdJLkEzmUQZ_kvD%3D8mrxi4V%3DhCmUp3g9MUZsddD%2Bmon%2BAvNtg%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)