Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 15919 invoked from network); 20 Apr 2010 17:18:14 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 Apr 2010 17:18:14 -0000 Received: (qmail 85170 invoked by uid 500); 20 Apr 2010 17:18:12 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 84623 invoked by uid 500); 20 Apr 2010 17:18:12 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 84470 invoked by uid 99); 20 Apr 2010 17:18:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Apr 2010 17:18:12 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tsaloranta@gmail.com designates 209.85.211.195 as permitted sender) Received: from [209.85.211.195] (HELO mail-yw0-f195.google.com) (209.85.211.195) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Apr 2010 17:18:04 +0000 Received: by ywh33 with SMTP id 33so3985735ywh.11 for ; Tue, 20 Apr 2010 10:17:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:content-type :content-transfer-encoding; bh=4XCVSnttJVVmGWGYwU9vvlwnub9uGt2E+NBul8zuLlk=; b=tohV3my1HAdtTjgMxVMRt7CUzBjBbVcnlCzuUKhnp9jEAx4u97Nfg28POd+W95TpZm vWV5Z3zz2cObyrmUXjKVhfnrxYGffU5ecMpQOMieczMrM1kXSAC5Xdf0Kt8QmmjwUzT/ BhuxK4SFNpK4fSvTdpKkvqXhhUdqx7u9Rv9Uw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=B4HrXWhXQi39iyeE1BtSrGppuE+3uZotGDTdLqwccE7yU2GVa8KkFcMb+5ONW+smcO QD2+vhfG+TUzuo29TR8CAx05MchrMELqsHDFSxvdIhc0PkhtvkwSIkRnjszYVdQ85viQ EahNSVPxP9sa7nzRZXBgmUp38HAod7AQccpOg= MIME-Version: 1.0 Received: by 10.90.93.17 with HTTP; Tue, 20 Apr 2010 10:17:43 -0700 (PDT) In-Reply-To: References: Date: Tue, 20 Apr 2010 10:17:43 -0700 Received: by 10.91.147.7 with SMTP id z7mr3769487agn.2.1271783863678; Tue, 20 Apr 2010 10:17:43 -0700 (PDT) Message-ID: Subject: Re: 0.6.1 insert 1B rows, crashed when using py_stress From: Tatu Saloranta To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Mon, Apr 19, 2010 at 7:12 PM, Brandon Williams wrote: > On Mon, Apr 19, 2010 at 9:06 PM, Schubert Zhang wrote= : >> >> 2. Reject the request when be short of resource, instead of throws OOME >> and exit (crash). > > Right, that is the crux of the problem =A0It will be addressed here: > https://issues.apache.org/jira/browse/CASSANDRA-685 I think it would be great to get such "graceful degradation" implemented: first thing any service should do is to protect itself against meltdown. Clients are better served by getting 50x responses (or rather its equivalent for thrift), to indicate transient overload, than get system into GC death spiral, where request time out but still consume significant amounts of resources. Especially since returning error response is usually rather cheap compared to doing full processing. Also it should be then easy to hook up failure information via JMX to expose it and allow alarming. But this is of course more difficult with distributed set up, especially since different QoS for different request would help (for example: communication between nodes & other things related to "accepted" requests should have higher priority than new incoming requests). -+ Tatu +-