Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A723A4811 for ; Wed, 29 Jun 2011 06:34:19 +0000 (UTC) Received: (qmail 87196 invoked by uid 500); 29 Jun 2011 06:34:15 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 87019 invoked by uid 500); 29 Jun 2011 06:34:02 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 86963 invoked by uid 99); 29 Jun 2011 06:33:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Jun 2011 06:33:53 +0000 X-ASF-Spam-Status: No, hits=-1996.4 required=5.0 tests=ALL_TRUSTED,FS_REPLICA,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Jun 2011 06:33:49 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 979D343841E for ; Wed, 29 Jun 2011 06:33:28 +0000 (UTC) Date: Wed, 29 Jun 2011 06:33:28 +0000 (UTC) From: "Mck SembWever (JIRA)" To: commits@cassandra.apache.org Message-ID: <1845608630.1383.1309329208617.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1440851143.12223.1301085665835.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057002#comment-13057002 ] Mck SembWever edited comment on CASSANDRA-2388 at 6/29/11 6:31 AM: ------------------------------------------------------------------- This does happen already (i've seen it while testing initial patches that were no good). Problem is that the TT is blacklisted, reducing hadoop's throughput for all jobs running. I bet too that a fallback to a replica is faster than a fallback to another TT. On a side note, there is no guarantee that any given TT will have its split accessible via a local c* node - this is only a preference in CFRR. A failed job may just as likely got to a random c* node. At least now we can actually properly limit to the one DC and sort by proximity. One thing we're not doing here is applying this same DC limit and sort by proximity in the case when there isn't a localhost preference. See CFRR.initialize(..) It would make sense to rewrite CFRR.getLocations(..) to {noformat} private Iterator getLocations(final Configuration conf) throws IOException { return new SplitEndpointIterator(conf); }{noformat} and then to move the finding-a-preference-to-localhost code into SplitEndpointIterator... was (Author: michaelsembwever): This does happen already (i've seen it while testing initial patches that were no good). Problem is that the TT is blacklisted, reducing hadoop's throughput for all jobs running. I bet too that a fallback to a replica is faster than a fallback to another TT. For example a c* node may die in the middle of a TT... > ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. > ------------------------------------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-2388 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 > Project: Cassandra > Issue Type: Bug > Components: Hadoop > Affects Versions: 0.7.6, 0.8.0 > Reporter: Eldon Stegall > Assignee: Jeremy Hanna > Labels: hadoop, inputformat > Fix For: 0.7.7, 0.8.2 > > Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch > > > ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira