Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 89D4110D1E for ; Thu, 12 Sep 2013 16:28:10 +0000 (UTC) Received: (qmail 94664 invoked by uid 500); 12 Sep 2013 16:27:34 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 93769 invoked by uid 500); 12 Sep 2013 16:27:25 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 93293 invoked by uid 99); 12 Sep 2013 16:27:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Sep 2013 16:27:23 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of synfinatic@gmail.com designates 209.85.217.178 as permitted sender) Received: from [209.85.217.178] (HELO mail-lb0-f178.google.com) (209.85.217.178) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Sep 2013 16:27:19 +0000 Received: by mail-lb0-f178.google.com with SMTP id z5so1139012lbh.9 for ; Thu, 12 Sep 2013 09:26:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=0pjMNhTRFIxPF1WS1XeRilPZ9xzlblreRPfjvtJfhdM=; b=soxDjlP2clNtneQuG5mFuD0H3l6t+X311KZH56rcJaHUlDw/uH7uiOVssSazNtTgOp YiFL3wci/P4yn1SLSLPEU/6bCiqScTW+u5s9a3/yiHKcTZXu4XJf1Z/AOXbKl+ID4Ffz XfVCCCUBRqWex5vMlwWkXc1UR2oE/3eUVbw1Gk9r+7bJv+UQOSHe87rWDNay7ZXBXfDT /sxDdkHHlhh5JY3nDFAyKnZRfpWxcJgvTrDH24mQUhV+Yoo8qfH7w60HCtL+Y6pfr4ht F3L6dRd0pYw/m/V3Y0jcM+qdC+m0VzxKuDhb/BIR8q2z2R9wvzERsE17wH+0mVwZPSNz KxdQ== X-Received: by 10.112.181.100 with SMTP id dv4mr2223812lbc.34.1379003217696; Thu, 12 Sep 2013 09:26:57 -0700 (PDT) MIME-Version: 1.0 Received: by 10.112.139.1 with HTTP; Thu, 12 Sep 2013 09:26:37 -0700 (PDT) In-Reply-To: References: From: Aaron Turner Date: Thu, 12 Sep 2013 09:26:37 -0700 Message-ID: Subject: Re: VMs versus Physical machines To: cassandra users Content-Type: multipart/alternative; boundary=001a11c36cdeef5cd304e63234e7 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c36cdeef5cd304e63234e7 Content-Type: text/plain; charset=ISO-8859-1 On Thu, Sep 12, 2013 at 5:42 AM, Shahab Yunus wrote: > I admit about missing details. Sorry for that. The thing is that I was > looking for guidance at the high-level so we can then sort out myself what > fits our requirements and use-cases (mainly because we are at the stage > that they could be molded according to hardware and software > limitations/features.) So, for example if it is recommended that ' for > heavy reads physical is better etc.') > > Anyway, just to give you a quick recap: > 1- Cassandra 1.2.8 > 2- Row is a unique userid and can have one or more columns. Every cell is > basically a blob of data (using Avro.) All information is in this one > table. No joins or other access patters. > 3- Writes can be both in bulk (which will of course has less strict > performance requirements) or real-time. All writes would be at the per > userid, hence, row level and constitute of adding new rows (of course with > some column values) or updating specific cells (column) of the existing row. > 4- Reads are per userid i.e. row and 90% of the time random reads for a > user. Rather than in bulk. > 5- Both reads and write interfaces are exposed through REST service as > well as direct Java client API. > 6- Reads and writes, as mentioned in 3&4 can be for 1 or more columns at a > time. > > Regards, > Shahab > Your total data set size and number of reads/writes per-second are the important things here. Also how sensitive are you to latency spikes (which tends to happen with VM's)? Long story short, the safest option is always physical IMHO. Use VM/cloud if you need to use VM/cloud for some reason (like all the other servers talking to Cassandra are also in AWS for example). Cloud can work (Netflix uses Cassandra on AWS), but your performance will be a lot more consistent on physical hardware and Cassandra like all databases likes lots of RAM (although this can be offset some with SSD's) which tends to be expensive in the cloud. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic https://github.com/synfinatic/tcpreplay - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin --001a11c36cdeef5cd304e63234e7 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable


On Thu, Sep 12, 2013 at 5:42 AM, Shahab Yunus <shahab.yunus@gmail= .com> wrote:
I admit about missing details. Sorry for = that. The thing is that I was looking for guidance at the high-level so we = can then sort out myself what fits our requirements and use-cases (mainly b= ecause we are at the stage that they could be molded according to hardware = and software limitations/features.) So, for example if it is recommended th= at ' for heavy reads physical is better etc.')

Anyway, just to give you a quick recap:
1- Cassand= ra 1.2.8
2- Row is a unique userid and can have one or more colum= ns. Every cell is basically a blob of data (using Avro.) All information is= in this one table. No joins or other access patters.
3- Writes can be both in bulk (which will of course has less strict pe= rformance requirements) or real-time. All writes would be at the per userid= , hence, row level and constitute of adding new rows (of course with some c= olumn values) or updating specific cells (column) of the existing row.
4- Reads are per userid i.e. row and 90% of the time random reads for = a user. Rather than in bulk.=A0
5- Both reads and write interface= s are exposed through REST service as well as direct Java client API.
6- Reads and writes, as mentioned in 3&4 can be for 1 or more colu= mns at a time.

Regards,
Shahab


Your total data set size = and number of reads/writes per-second are the important things here. =A0Als= o how sensitive are you to latency spikes (which tends to happen with VM= 9;s)?

Long story short, the safest option is always physical = IMHO. =A0Use VM/cloud if you need to use VM/cloud for some reason (like all= the other servers talking to Cassandra are also in AWS for example). =A0Cl= oud can work (Netflix uses Cassandra on AWS), but your performance will be = a lot more consistent on physical hardware and Cassandra like all databases= likes lots of RAM (although this can be offset some with SSD's) which = tends to be expensive in the cloud.




--=A0
Aaron Turner
http://synfin.net/=A0 =A0 =A0 =A0=A0 Twitter: = @synfinatic
https://github.com/synfinatic/tcpreplay=A0- Pcap editing an= d replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary= =A0
Safety, deserve neither Liberty nor Safety.=A0=A0
=A0 =A0 -- Benj= amin Franklin

--001a11c36cdeef5cd304e63234e7--