Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3F78D10F75 for ; Wed, 24 Jul 2013 11:13:11 +0000 (UTC) Received: (qmail 11586 invoked by uid 500); 24 Jul 2013 11:13:07 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 11553 invoked by uid 500); 24 Jul 2013 11:13:06 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 11545 invoked by uid 99); 24 Jul 2013 11:13:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Jul 2013 11:13:05 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of chris.wirt@struq.com designates 209.85.215.178 as permitted sender) Received: from [209.85.215.178] (HELO mail-ea0-f178.google.com) (209.85.215.178) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Jul 2013 11:12:58 +0000 Received: by mail-ea0-f178.google.com with SMTP id l15so160210eak.37 for ; Wed, 24 Jul 2013 04:12:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:to:references:in-reply-to:subject:date:message-id:mime-version :content-type:x-mailer:thread-index:content-language :x-gm-message-state; bh=/WyvsUYA6tKUmlNr4Ix96POs/DTllpwVsTKPeeZvRB0=; b=ED3GHa6rPy2MLxbAGYnq6o9KE2FSFLPKKDGgU2cfavpU4K6IbhNRNYLZ4vsqDg44lh kIrReyVZGvUEpDJShpjKSyHseH/SDHOLwo2smlQEaVLyTJX88Qz3cXL61fWFc1bgGBIX zAis0ygtGDOVU5WV6zlHvGibxrxQTba3Y5iCB0sa9QmOU22PjfusK6QB3tVc88Ide9xV BXRrZ+f0clPWyJJkoZLd3iAk7/S9gytP69FfIGkcYQhtV/qtBb6ewWPnhWnswvNZR+vO HM/bKTX5+ZIWHm5QTQdkUyTJ79UA6mW5ecDXxrCM7JJWURkMtc/fVT3sALvSPZExU4Do U6dA== X-Received: by 10.14.2.137 with SMTP id 9mr36816935eef.64.1374664357007; Wed, 24 Jul 2013 04:12:37 -0700 (PDT) Received: from StevePereiraPC (host81-134-7-42.in-addr.btopenworld.com. [81.134.7.42]) by mx.google.com with ESMTPSA id n45sm65380714eew.1.2013.07.24.04.12.35 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 24 Jul 2013 04:12:36 -0700 (PDT) From: "Christopher Wirt" To: References: <41C4F926-156E-4245-97A1-3D6CE35760C9@gmail.com> In-Reply-To: <41C4F926-156E-4245-97A1-3D6CE35760C9@gmail.com> Subject: RE: disappointed Date: Wed, 24 Jul 2013 12:12:34 +0100 Message-ID: <00f801ce885e$b3864e70$1a92eb50$@struq.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_00F9_01CE8867.154E8700" X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQL06EHT6M1Ji2QKuwwpcbp8FSNU1JcmwY8Q Content-Language: en-gb X-Gm-Message-State: ALoCoQm5SYRXXg4/idd/58rHdZ5sL1DMxVfQR2Jn/UMjTd7kJP/EhMAIK3vcX1+lyhiyE6U7UleF X-Virus-Checked: Checked by ClamAV on apache.org This is a multipart message in MIME format. ------=_NextPart_000_00F9_01CE8867.154E8700 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi Paul, Sorry to hear you're having a low point. We ended up not using the collection features of 1.2. Instead storing a compressed string containing the map and handling client side. We only have fixed schema short rows so no experience with large row compaction. File descriptors have never got that high for us. But, if you only have a couple physical nodes with loads of data and small ss-tables maybe they could get that high? Only time I've had file descriptors get out of hand was then compaction got slightly confused with a new schema when I dropped and recreated instead of truncating. https://issues.apache.org/jira/browse/CASSANDRA-4857 restarting the node fixed the issue. >From my limited experience I think Cassandra is a dangerous choice for an young limited funding/experience start-up expecting to scale fast. We are a fairly mature start-up with funding. We've just spent 3-5 months moving from Mongo to Cassandra. It's been expensive and painful getting Cassandra to read like Mongo, but we've made it J From: Paul Ingalls [mailto:paulingalls@gmail.com] Sent: 24 July 2013 06:01 To: user@cassandra.apache.org Subject: disappointed I want to check in. I'm sad, mad and afraid. I've been trying to get a 1.2 cluster up and working with my data set for three weeks with no success. I've been running a 1.1 cluster for 8 months now with no hiccups, but for me at least 1.2 has been a disaster. I had high hopes for leveraging the new features of 1.2, specifically vnodes and collections. But at this point I can't release my system into production, and will probably need to find a new back end. As a small startup, this could be catastrophic. I'm mostly mad at myself. I took a risk moving to the new tech. I forgot sometimes when you gamble, you lose. First, the performance of 1.2.6 was horrible when using collections. I wasn't able to push through 500k rows before the cluster became unusable. With a lot of digging, and way too much time, I discovered I was hitting a bug that had just been fixed, but was unreleased. This scared me, because the release was already at 1.2.6 and I would have expected something as https://issues.apache.org/jira/browse/CASSANDRA-5677 would have been addressed long before. But gamely I grabbed the latest code from the 1.2 branch, built it and I was finally able to get past half a million rows. But, then I hit ~4 million rows, and a multitude of problems. Even with the fix above, I was still seeing a ton of compactions failing, specifically the ones for large rows. Not a single large row will compact, they all assert with the wrong size. Worse, and this is what kills the whole thing, I keep hitting a wall with open files, even after dumping the whole DB, dropping vnodes and trying again. Seriously, 650k open file descriptors? When it hits this limit, the whole DB craps out and is basically unusable. This isn't that many rows. I have close to a half a billion in 1.1. I'm now at a standstill. I figure I have two options unless someone here can help me. Neither of them involve 1.2. I can either go back to 1.1 and remove the features that collections added to my service, or I find another data backend that has similar performance characteristics to cassandra but allows collections type behavior in a scalable manner. Cause as far as I can tell, 1.2 doesn't scale. Which makes me sad, I was proud of what I accomplished with 1.1.. Does anyone know why there are so many open file descriptors? Any ideas on why a large row won't compact? Paul ------=_NextPart_000_00F9_01CE8867.154E8700 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi Paul,

 

Sorry to hear you’re having a low = point.

 

We ended up not using the collection features of 1.2. =

Instead storing a compressed string containing the map and handling = client side.

 

We only have fixed schema short rows so no experience with large row = compaction.

 

File descriptors have never got that high for us. But, if you only = have a couple physical nodes with loads of data and small ss-tables = maybe they could get that high?

 

Only time I’ve had file descriptors get out of hand was then = compaction got slightly confused with a new schema when I dropped and = recreated instead of truncating. https://iss= ues.apache.org/jira/browse/CASSANDRA-4857 restarting the node fixed = the issue.

 

 

From my limited experience I think Cassandra is a dangerous choice = for an young limited funding/experience start-up expecting to scale = fast. We are a fairly mature start-up with funding. We’ve just = spent 3-5 months moving from Mongo to Cassandra. It’s been = expensive and painful getting Cassandra to read like Mongo, but = we’ve made it J

 

 

 

 

From:= Paul = Ingalls [mailto:paulingalls@gmail.com]
Sent: 24 July 2013 = 06:01
To: user@cassandra.apache.org
Subject: = disappointed

 

I want to = check in.  I'm sad, mad and afraid.  I've been trying to get a = 1.2 cluster up and working with my data set for three weeks with no = success.  I've been running a 1.1 cluster for 8 months now with no = hiccups, but for me at least 1.2 has been a disaster.  I had high = hopes for leveraging the new features of 1.2, specifically vnodes and = collections.   But at this point I can't release my system into = production, and will probably need to find a new back end.  As a = small startup, this could be catastrophic.  I'm mostly mad at = myself.  I took a risk moving to the new tech.  I forgot = sometimes when you gamble, you lose.

 

First, the performance of 1.2.6 was horrible when = using collections.  I wasn't able to push through 500k rows before = the cluster became unusable.  With a lot of digging, and way too = much time, I discovered I was hitting a bug that had just been fixed, = but was unreleased.  This scared me, because the release was = already at 1.2.6 and I would have expected something as https://iss= ues.apache.org/jira/browse/CASSANDRA-5677 would have been = addressed long before.  But gamely I grabbed the latest code from = the 1.2 branch, built it and I was finally able to get past half a = million rows.  

 

But, then I hit ~4 million rows, and a multitude of = problems.  Even with the fix above, I was still seeing a ton of = compactions failing, specifically the ones for large rows.  Not a = single large row will compact, they all assert with the wrong size. =  Worse, and this is what kills the whole thing, I keep hitting a = wall with open files, even after dumping the whole DB, dropping vnodes = and trying again.  Seriously, 650k open file descriptors? =  When it hits this limit, the whole DB craps out and is basically = unusable.  This isn't that many rows.  I have close to a half = a billion in 1.1…

 

I'm now at a standstill.  I figure I have two = options unless someone here can help me.  Neither of them involve = 1.2.  I can either go back to 1.1 and remove the features that = collections added to my service, or I find another data backend that has = similar performance characteristics to cassandra but allows collections = type behavior in a scalable manner.  Cause as far as I can tell, = 1.2 doesn't scale.  Which makes me sad, I was proud of what I = accomplished with 1.1….

 

Does anyone know why there are so many open file = descriptors?  Any ideas on why a large row won't = compact?

 

Paul

------=_NextPart_000_00F9_01CE8867.154E8700--