Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A76E1D1C8 for ; Wed, 23 Jan 2013 01:17:13 +0000 (UTC) Received: (qmail 51439 invoked by uid 500); 23 Jan 2013 01:17:13 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 51398 invoked by uid 500); 23 Jan 2013 01:17:13 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 51355 invoked by uid 99); 23 Jan 2013 01:17:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Jan 2013 01:17:13 +0000 Date: Wed, 23 Jan 2013 01:17:12 +0000 (UTC) From: "Colin Patrick McCabe (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-4417) HDFS-347: fix case where local reads get disabled incorrectly MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-4417: --------------------------------------- Attachment: (was: fail.patch) > HDFS-347: fix case where local reads get disabled incorrectly > ------------------------------------------------------------- > > Key: HDFS-4417 > URL: https://issues.apache.org/jira/browse/HDFS-4417 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, hdfs-client, performance > Reporter: Todd Lipcon > Assignee: Colin Patrick McCabe > Attachments: fail.patch, HDFS-4417.002.patch, HDFS-4417.003.patch, HDFS-4417.004.patch, hdfs-4417.txt > > > In testing HDFS-347 against HBase (thanks [~jdcryans]) we ran into the following case: > - a workload is running which puts a bunch of local sockets in the PeerCache > - the workload abates for a while, causing the sockets to go "stale" (ie the DN side disconnects after the keepalive timeout) > - the workload starts again > In this case, the local socket retrieved from the cache failed the newBlockReader call, and it incorrectly disabled local sockets on that host. This is similar to an earlier bug HDFS-3376, but not quite the same. > The next issue we ran into is that, once this happened, it never tried local sockets again, because the cache held lots of TCP sockets. Since we always managed to get a cached socket to the local node, it didn't bother trying local read again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira