Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6AFC39FB1 for ; Wed, 19 Oct 2011 17:23:34 +0000 (UTC) Received: (qmail 11041 invoked by uid 500); 19 Oct 2011 17:23:34 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 11008 invoked by uid 500); 19 Oct 2011 17:23:34 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 11000 invoked by uid 99); 19 Oct 2011 17:23:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Oct 2011 17:23:34 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Oct 2011 17:23:31 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id D4EC5311CDE for ; Wed, 19 Oct 2011 17:23:10 +0000 (UTC) Date: Wed, 19 Oct 2011 17:23:10 +0000 (UTC) From: "ramkrishna.s.vasudevan (Updated) (JIRA)" To: issues@hbase.apache.org Message-ID: <1549222679.11244.1319044990873.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1498331948.3851.1316728406227.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (HBASE-4462) Properly treating SocketTimeoutException MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4462: ------------------------------------------ Status: Patch Available (was: Open) > Properly treating SocketTimeoutException > ---------------------------------------- > > Key: HBASE-4462 > URL: https://issues.apache.org/jira/browse/HBASE-4462 > Project: HBase > Issue Type: Improvement > Affects Versions: 0.90.4 > Reporter: Jean-Daniel Cryans > Assignee: ramkrishna.s.vasudevan > Fix For: 0.90.5 > > Attachments: HBASE-4462_0.90.x.patch > > > SocketTimeoutException is currently treated like any IOE inside of HCM.getRegionServerWithRetries and I think this is a problem. This method should only do retries in cases where we are pretty sure the operation will complete, but with STE we already waited for (by default) 60 seconds and nothing happened. > I found this while debugging Douglas Campbell's problem on the mailing list where it seemed like he was using the same scanner from multiple threads, but actually it was just the same client doing retries while the first run didn't even finish yet (that's another problem). You could see the first scanner, then up to two other handlers waiting for it to finish in order to run (because of the synchronization on RegionScanner). > So what should we do? We could treat STE as a DoNotRetryException and let the client deal with it, or we could retry only once. > There's also the option of having a different behavior for get/put/icv/scan, the issue with operations that modify a cell is that you don't know if the operation completed or not (same when a RS dies hard after completing let's say a Put but just before returning to the client). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira