Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4313F10F0E for ; Tue, 16 Dec 2014 22:14:14 +0000 (UTC) Received: (qmail 44412 invoked by uid 500); 16 Dec 2014 22:14:14 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 44377 invoked by uid 500); 16 Dec 2014 22:14:14 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 44365 invoked by uid 99); 16 Dec 2014 22:14:14 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Dec 2014 22:14:14 +0000 Date: Tue, 16 Dec 2014 22:14:13 +0000 (UTC) From: "Tyler Hobbs (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-7886) Coordinator should not wait for read timeouts when replicas hit Exceptions MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249056#comment-14249056 ] Tyler Hobbs commented on CASSANDRA-7886: ---------------------------------------- bq. Hi Tyler Hobbs, sorry I kept you waiting for so long. No worries, I know you're busy :) bq. The commented code was meant as a preparation for WriteFailureExceptions. Does it perhaps make sense to fully add WriteFailureException? As a follow up ticket, we could implement it then for the different writes. Or do you want me to get rid it? I do think it's a good idea to implement something similar for writes, and splitting that into a second ticket would be good. So go ahead and delete the comments for this patch. {quote} Just to make sure that we dont touch anything new here: TOEs are logged inside SliceQueryFilter.collectReducedColumns already. I simply took this catch block from the ReadVerbHandler/RangeSliceVerbHandler and put into StorageProxy/MessageDeliveryTask. I don't like that either, but I did not want to touch it. Do you still want me to change it? {quote} Yes, go ahead and remove those other try/catch blocks as well. I can't see a reason why they should be suppressed once the logging statement is removed. bq. I merged ReadTimeoutException|ReadFailureException into a single catch block. Cool. The way you did it there looks perfect. Further up in StorageProxy there's an almost identical chunk of code. Can you condense that one as well? bq. I also added the last cell-name to the TOE, so that an administrator can get an estimate where to look for the tombstones. This doesn't really match the tickets new name, but is related to my original issue The many implementations of CellName don't implement {{toString()}}, so I think you want {{container.getComparator().getString(cell.name())}} instead. > Coordinator should not wait for read timeouts when replicas hit Exceptions > -------------------------------------------------------------------------- > > Key: CASSANDRA-7886 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7886 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Tested with Cassandra 2.0.8 > Reporter: Christian Spriegel > Assignee: Christian Spriegel > Priority: Minor > Labels: protocolv4 > Fix For: 3.0 > > Attachments: 7886_v1.txt, 7886_v2_trunk.txt, 7886_v3_trunk.txt, 7886_v4_trunk.txt > > > *Issue* > When you have TombstoneOverwhelmingExceptions occuring in queries, this will cause the query to be simply dropped on every data-node, but no response is sent back to the coordinator. Instead the coordinator waits for the specified read_request_timeout_in_ms. > On the application side this can cause memory issues, since the application is waiting for the timeout interval for every request.Therefore, if our application runs into TombstoneOverwhelmingExceptions, then (sooner or later) our entire application cluster goes down :-( > *Proposed solution* > I think the data nodes should send a error message to the coordinator when they run into a TombstoneOverwhelmingException. Then the coordinator does not have to wait for the timeout-interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)