Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 725DE10157 for ; Mon, 5 Aug 2013 01:02:43 +0000 (UTC) Received: (qmail 17308 invoked by uid 500); 5 Aug 2013 01:02:41 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 17284 invoked by uid 500); 5 Aug 2013 01:02:41 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 17276 invoked by uid 99); 5 Aug 2013 01:02:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Aug 2013 01:02:40 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ben@instaclustr.com designates 209.85.160.46 as permitted sender) Received: from [209.85.160.46] (HELO mail-pb0-f46.google.com) (209.85.160.46) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Aug 2013 01:02:34 +0000 Received: by mail-pb0-f46.google.com with SMTP id rq2so2696250pbb.5 for ; Sun, 04 Aug 2013 18:02:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:content-type:message-id:mime-version:subject:date:references :to:in-reply-to:x-mailer:x-gm-message-state; bh=Nl9e1PdItZuNDdTrO/m+iyGLkQ+5x1zDyJ0sxUNCg/o=; b=Gi7lF1Ffah7qSHAfU5UTXZM04aKb6ZBfuEUOliZAw5nAQoCdJIucyBREZXCtzr8PHk rE8JXVijD1dNTBgndoUH3bHt0Ty+OcnSAy9Sdr5MsTXr7Irejzn9SQHBaLpUn19pMWFz kRp+f1DLq92qPS/wI+g/9veKoxKBFNq5TwrYtdOTnJZj+0EZuVMID2XL4gtYlsm8YZDl iIO+0UFEO/QySp7XtW78sgiN6hk/8AhsKSqiH0GFHCQMy3ztjTw2JLVwYyjBQmQIofLO tbfKO5j/A1XmGweuQkU3/94jETh5jBepJpQDrZZrcVvAUDrPoCqGdujxhjYbOIWiBZtg 3Ncg== X-Received: by 10.66.246.225 with SMTP id xz1mr22037913pac.110.1375664532981; Sun, 04 Aug 2013 18:02:12 -0700 (PDT) Received: from [192.168.1.205] ([122.150.96.14]) by mx.google.com with ESMTPSA id ib9sm23934956pbc.43.2013.08.04.18.02.09 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 04 Aug 2013 18:02:11 -0700 (PDT) From: Ben Bromhead Content-Type: multipart/alternative; boundary="Apple-Mail=_8F54C949-C9A5-465A-A6B9-6301B0BC0924" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: Which of these VPS configurations would perform better for Cassandra ? Date: Mon, 5 Aug 2013 11:02:06 +1000 References: <11979688-841C-4313-969C-3B0E30864A45@humbaba.net> <1E15D98F-6DA9-4D3F-9518-A08F25D61164@humbaba.net> To: user@cassandra.apache.org In-Reply-To: <1E15D98F-6DA9-4D3F-9518-A08F25D61164@humbaba.net> X-Mailer: Apple Mail (2.1508) X-Gm-Message-State: ALoCoQkXLiz7fBwhpd3p+Wd5REr7oWPlqV+D7taOFrGiMW6hAmcmVG/8h5HkFVAcdLbd9yshG+jv X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_8F54C949-C9A5-465A-A6B9-6301B0BC0924 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 If you want to get a rough idea of how things will perform, fire up YCSB = (https://github.com/brianfrankcooper/YCSB/wiki) and run the tests that = closest match how you think your workload will be (run the test clients = from a couple of beefy AWS spot-instances for less than a dollar). As = you are a new startup without any existing load/traffic patterns, = benchmarking will be your best bet. As a have a look at running Cassandra with SmartOS on Joyent. When you = run SmartOS on Joyent virtualisation is done using solaris zones, an OS = based virtualisation, which is at least a quadrillion times better than = KVM, xen etc.=20 Ok maybe not that much=85 but it is pretty cool and has the following = benefits: - No hardware emulation. - Shared kernel with the host (you don't have to waste precious memory = running a guest os). - ZFS :) Have a read of = http://wiki.smartos.org/display/DOC/SmartOS+Virtualization for more = info. There are some downsides as well: The version of Cassandra that comes with the SmartOS package management = system is old and busted, so you will want to build from source.=20 You will want to be technically confident in running on something a = little outside the norm (SmartOS is based on Solaris). Just make sure you test and benchmark all your options, a few days of = testing now will save you weeks of pain. Good luck! Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr=20 On 05/08/2013, at 12:34 AM, David Schairer = wrote: > Of course -- my point is simply that if you're looking for speed, = SSD+KVM, especially in a shared tenant situation, is unlikely to perform = the way you want to. If you're building a pure proof of concept that = never stresses the system, it doesn't matter, but if you plan an MVP = with any sort of scale, you'll want a plan to be on something more = robust. =20 >=20 > I'll also say that it's really important (imho) to be doing even your = dev in a config where you have consistency conditions like eventual = production -- so make sure you're writing to both nodes and can have = cases where eventual consistency delays kick in, or it'll come back to = bite you later -- I've seen this force people to redesign their whole = data model when they don't plan for it initially. =20 >=20 > As I said, I haven't tested DO. I've tested very similar = configurations at other providers and they were all terrible under load = -- and certainly took away most of the benefits of SSD once you stressed = writes a bit. XEN+SSD, on modern kernels, should work better, but I = didn't test it (linode doesn't offer this, though, and they've had lots = of other challenges of late). =20 >=20 > --DRS >=20 > On Aug 3, 2013, at 11:40 PM, Ertio Lew wrote: >=20 >> @David: >> Like all other start-ups, we too cannot start with all dedicated = servers for Cassandra. So right now we have no better choice except for = using a VPS :), but we can definitely choose one from amongst a suitable = set of VPS configurations. As of now since we are starting out, could we = initiate our cluster with 2 nodes(RF=3D2), (KVM, 2GB ram, 2 cores, 30GB = SDD) . Right now we wont we having a very heavy load on Cassandra until = a next few months till we grow our user base. So, this choice is mainly = based on the pricing vs configuration as well as digital ocean's good = reputation in the community. >>=20 >>=20 >> On Sun, Aug 4, 2013 at 12:53 AM, David Schairer = wrote: >> I've run several lab configurations on linodes; I wouldn't run = cassandra on any shared virtual platform for large-scale production, = just because your IO performance is going to be really hard to predict. = Lots of people do, though -- depends on your cassandra loads and how = consistent you need to have performance be, as well as how much of your = working set will fit into memory. Remember that linode significantly = oversells their CPU as well. >>=20 >> The release version of KVM, at least as of a few months ago, still = doesn't support TRIM on SSD; that, plus the fact that you don't know how = others will use SSDs or if their file systems will keep the SSDs = healthy, means that SSD performance on KVM is going to be highly = unpredictable. I have not tested digitalocean, but I did test several = other KVM+SSD shared-tenant hosting providers aggressively for cassandra = a couple months ago; they all failed badly. >>=20 >> Your mileage will vary considerably based on what you need out of = cassandra, what your data patterns look like, and how you configure your = system. That said, I would use xen before KVM for high-performance IO. >>=20 >> I have not run Cassandra in any volume on Amazon -- lots of folks = have, and may have recommendations (including SSD) there for where it = falls on the price/performance curve. >>=20 >> --DRS >>=20 >> On Aug 3, 2013, at 11:33 AM, Ertio Lew wrote: >>=20 >>> I am building a cluster(initially starting with a 2-3 nodes = cluster). I have came across two seemingly good options for hosting, = Linode & Digital Ocean. VPS configuration for both listed below: >>>=20 >>>=20 >>> Linode:- >>> ------------------ >>> XEN Virtualization >>> 2 GB RAM >>> 8 cores CPU (2x priority) (8 processor Xen instances) >>> 96 GB Storage >>>=20 >>>=20 >>> Digital Ocean:- >>> ------------------------- >>> KVM Virtualization >>> 2GB Memory >>> 2 Cores >>> 40GB **SSD Disk*** >>> Digitial Ocean's VPS is at half price of above listed Linode VPS, >>>=20 >>>=20 >>> Could you clarify which of these two VPS would be better as = Cassandra nodes ? >>>=20 >>>=20 >>=20 >>=20 >=20 --Apple-Mail=_8F54C949-C9A5-465A-A6B9-6301B0BC0924 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252 If = you want to get a rough idea of how things will perform, fire up YCSB = (https://github.com/= brianfrankcooper/YCSB/wiki) and run the tests that closest match how = you think your workload will be (run the test clients from a couple of = beefy AWS spot-instances for less than a dollar). As you are a new = startup without any existing load/traffic patterns, benchmarking will be = your best bet.

As a have a look at running Cassandra = with SmartOS on Joyent. When you run SmartOS on Joyent virtualisation is = done using solaris zones, an OS based virtualisation, which is at least = a quadrillion times better than KVM, xen = etc. 

Ok maybe not that much=85 but it is = pretty cool and has the following benefits:

- = No hardware emulation.
- Shared kernel with the host (you = don't have to waste precious memory running a guest os).
- ZFS = :)


There are some downsides as = well:

The version of Cassandra that comes with = the SmartOS package management system is old and busted, so you will = want to build from source. 
You will want to be = technically confident in running on something a little outside the norm = (SmartOS is based on Solaris).

Just make sure = you test and benchmark all your options, a few days of testing now will = save you weeks of pain.

Good = luck!

www.instaclustr.com | = @instaclustr 
=


On 05/08/2013, at 12:34 AM, David Schairer <dschairer@humbaba.net> = wrote:

Of course -- my point is simply that if you're looking for = speed, SSD+KVM, especially in a shared tenant situation, is unlikely to = perform the way you want to.  If you're building a pure proof of = concept that never stresses the system, it doesn't matter, but if you = plan an MVP with any sort of scale, you'll want a plan to be on = something more robust.  

I'll also say that it's really = important (imho) to be doing even your dev in a config where you have = consistency conditions like eventual production -- so make sure you're = writing to both nodes and can have cases where eventual consistency = delays kick in, or it'll come back to bite you later -- I've seen this = force people to redesign their whole data model when they don't plan for = it initially.  

As I said, I haven't tested DO.  I've = tested very similar configurations at other providers and they were all = terrible under load -- and certainly took away most of the benefits of = SSD once you stressed writes a bit.  XEN+SSD, on modern kernels, = should work better, but I didn't test it (linode doesn't offer this, = though, and they've had lots of other challenges of late). =  

--DRS

On Aug 3, 2013, at 11:40 PM, Ertio Lew <ertiop93@gmail.com> = wrote:

@David:
Like all other = start-ups, we too cannot start with all dedicated servers for Cassandra. = So right now we have no better choice except for using a VPS :), but we = can definitely choose one from amongst a suitable set of VPS = configurations. As of now since we are starting out, could we initiate = our cluster with 2 nodes(RF=3D2), (KVM, 2GB ram, 2 cores, 30GB SDD) . = Right now we wont we having a very heavy load on Cassandra until a next = few months till we grow our user base. So, this choice is mainly based = on the pricing vs configuration as well as digital ocean's good = reputation in the community.


On Sun, Aug 4, 2013 at 12:53 AM, = David Schairer <dschairer@humbaba.net> = wrote:
I've run several lab configurations on linodes; I wouldn't run = cassandra on any shared virtual platform for large-scale production, = just because your IO performance is going to be really hard to predict. =  Lots of people do, though -- depends on your cassandra loads and = how consistent you need to have performance be, as well as how much of = your working set will fit into memory.  Remember that linode = significantly oversells their CPU as well.

The release version of = KVM, at least as of a few months ago, still doesn't support TRIM on SSD; = that, plus the fact that you don't know how others will use SSDs or if = their file systems will keep the SSDs healthy, means that SSD = performance on KVM is going to be highly unpredictable.  I have not = tested digitalocean, but I did test several other KVM+SSD shared-tenant = hosting providers aggressively for cassandra a couple months ago; they = all failed badly.

Your mileage will vary considerably based on = what you need out of cassandra, what your data patterns look like, and = how you configure your system.  That said, I would use xen before = KVM for high-performance IO.

I have not run Cassandra in any = volume on Amazon -- lots of folks have, and may have recommendations = (including SSD) there for where it falls on the price/performance = curve.

--DRS

On Aug 3, 2013, at 11:33 AM, Ertio Lew <ertiop93@gmail.com> = wrote:

I am building a = cluster(initially starting with a 2-3 nodes cluster). I have came across = two seemingly good options for hosting, Linode & Digital Ocean. VPS = configuration for both listed = below:


Linode:-
------------------
XEN = Virtualization
2 GB RAM
8 cores CPU (2x priority) (8 processor Xen = instances)
96 GB Storage


Digital = Ocean:-
-------------------------
KVM Virtualization
2GB = Memory
2 Cores
40GB **SSD Disk***
Digitial Ocean's VPS is at = half price of above listed Linode VPS,


Could you clarify = which of these two VPS would be better as Cassandra nodes = ?






= = --Apple-Mail=_8F54C949-C9A5-465A-A6B9-6301B0BC0924--