Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E722C1897F for ; Thu, 4 Feb 2016 00:10:05 +0000 (UTC) Received: (qmail 81830 invoked by uid 500); 4 Feb 2016 00:10:01 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 81792 invoked by uid 500); 4 Feb 2016 00:10:01 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 81782 invoked by uid 99); 4 Feb 2016 00:10:01 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Feb 2016 00:10:01 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 18F5618027B for ; Thu, 4 Feb 2016 00:10:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.049 X-Spam-Level: ** X-Spam-Status: No, score=2.049 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, KAM_INFOUSMEBIZ=0.75, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=codojo-me.20150623.gappssmtp.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id CPKsvzI4Bw8c for ; Thu, 4 Feb 2016 00:09:55 +0000 (UTC) Received: from mail-yk0-f172.google.com (mail-yk0-f172.google.com [209.85.160.172]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 0D9F9207E1 for ; Thu, 4 Feb 2016 00:09:54 +0000 (UTC) Received: by mail-yk0-f172.google.com with SMTP id z7so33350358yka.3 for ; Wed, 03 Feb 2016 16:09:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=codojo-me.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=VcjiU38akz6s9OQfNMwmdQ9FZ0LmBDIP7IbSmwW9mrE=; b=BgesFZfU/c0gpB4/K6hW7p+d0a5WwBHOeRmD/wm+F6igDcfvSKCzNdH3Rcfs0OaMcG BuYX4D6bcVDI6BfAHdqEUVY/LVqBanN1Yh1YJiTbslGtUkEQHxOPwFn3SKmIARx+HNe9 CXSEy2Zo4DgQDamNVk1Nq1I+sU5eVboMSANlN9oWg5eIGd4nDUJZsZN5mmW/qin9WSeP odzV3BviMuJMENpqqW0Qj6+LmZajNOuuzfT28U2nYAjWW8SkQPb+Q4voVRbll/jSLnZG CNPQwY27JXf3T4J9jg7giK5ZYxBv4AxVB46eCItSQ8OWEmW6I/OErgPme+Bn5J6WAh+g 08Wg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=VcjiU38akz6s9OQfNMwmdQ9FZ0LmBDIP7IbSmwW9mrE=; b=PAiVskLPmriAsbEF9GV1A/A4jQXn4j1C4Ca0Tc0JXG4zn8ATBoTE8RlD2wdQn/O0vi 8kKau3+mlCtwaYHqy+Zms9rv4acCyHjxnzNlPa/Rm8PDTBFW0QPedcIgfah8r/eEymkJ FFsJeqKqnTqE2XmlgpAawFDSPHQJrgJE8CzDj4j7WDWGGK9ZvDIU/UAL9/oUFpp54tH1 Rc8tp+WDi9HSkQN7JNBuqwv25cIPeYPB+1T0hKsxp7BZUprigEtFP5IMJbCz8pBz5vun BxkuYUVZYaOs3gIKgpqen/ioORzY6yJ/n+zb4OvL78qaMdLSEA/dBuxadsh2172SDU7a vAoA== X-Gm-Message-State: AG10YOR0WGVeZ+6xH1bc4sHWOCDgI5PuhqFNoffApFzTx6HR5VvICPmsmRYK4EwH3SLXMiBWqFKd+FqNeCBheQ== MIME-Version: 1.0 X-Received: by 10.37.80.1 with SMTP id e1mr2310184ybb.26.1454544594159; Wed, 03 Feb 2016 16:09:54 -0800 (PST) Received: by 10.37.40.135 with HTTP; Wed, 3 Feb 2016 16:09:53 -0800 (PST) In-Reply-To: References: <95704B52-3DE6-4080-AD6F-13180C8DDA05@crowdstrike.com> <7662DFFB-53E1-4C3C-9E1C-2A57415E3B10@crowdstrike.com> <467A12D4-5BF7-4965-A108-6496D233F7ED@crowdstrike.com> <1DDDC178-9D21-4CF6-97AB-9DC44A05EE4C@crowdstrike.com> Date: Wed, 3 Feb 2016 16:09:53 -0800 Message-ID: Subject: Re: EC2 storage options for C* From: James Rothering To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a113dc78ed88362052ae68dd9 --001a113dc78ed88362052ae68dd9 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Just curious here ... when did EBS become OK for C*? Didn't they always push towards using ephemeral disks? On Wed, Feb 3, 2016 at 12:17 PM, Ben Bromhead wrote: > For what it's worth we've tried d2 instances and they encourage terrible > things like super dense nodes (increases your replacement time). In terms > of useable storage I would go with gp2 EBS on a m4 based instance. > > On Mon, 1 Feb 2016 at 14:25 Jack Krupansky > wrote: > >> Ah, yes, the good old days of m1.large. >> >> -- Jack Krupansky >> >> On Mon, Feb 1, 2016 at 5:12 PM, Jeff Jirsa >> wrote: >> >>> A lot of people use the old gen instances (m1 in particular) because >>> they came with a ton of effectively free ephemeral storage (up to 1.6TB= ). >>> Whether or not they=E2=80=99re viable is a decision for each user to ma= ke. They=E2=80=99re >>> very, very commonly used for C*, though. At a time when EBS was not >>> sufficiently robust or reliable, a cluster of m1 instances was the de f= acto >>> standard. >>> >>> The canonical =E2=80=9Cbest practice=E2=80=9D in 2015 was i2. We believ= e we=E2=80=99ve made a >>> compelling argument to use m4 or c4 instead of i2. There exists a compa= ny >>> we know currently testing d2 at scale, though I=E2=80=99m not sure they= have much >>> in terms of concrete results at this time. >>> >>> - Jeff >>> >>> From: Jack Krupansky >>> Reply-To: "user@cassandra.apache.org" >>> Date: Monday, February 1, 2016 at 1:55 PM >>> >>> To: "user@cassandra.apache.org" >>> Subject: Re: EC2 storage options for C* >>> >>> Thanks. My typo - I referenced "C2 Dense Storage" which is really "D2 >>> Dense Storage". >>> >>> The remaining question is whether any of the "Previous Generation >>> Instances" should be publicly recommended going forward. >>> >>> And whether non-SSD instances should be recommended going forward as >>> well. sure, technically, someone could use the legacy instances, but th= e >>> question is what we should be recommending as best practice going forwa= rd. >>> >>> Yeah, the i2 instances look like the sweet spot for any non-EBS cluster= s. >>> >>> -- Jack Krupansky >>> >>> On Mon, Feb 1, 2016 at 4:30 PM, Steve Robenalt >>> wrote: >>> >>>> Hi Jack, >>>> >>>> At the bottom of the instance-types page, there is a link to the >>>> previous generations, which includes the older series (m1, m2, etc), m= any >>>> of which have HDD options. >>>> >>>> There are also the d2 (Dense Storage) instances in the current >>>> generation that include various combos of local HDDs. >>>> >>>> The i2 series has good sized SSDs available, and has the advanced >>>> networking option, which is also useful for Cassandra. The enhanced >>>> networking is available with other instance types as well, as you'll s= ee on >>>> the feature list under each type. >>>> >>>> Steve >>>> >>>> >>>> >>>> On Mon, Feb 1, 2016 at 1:17 PM, Jack Krupansky < >>>> jack.krupansky@gmail.com> wrote: >>>> >>>>> Thanks. Reading a little bit on AWS, and back to my SSD vs. magnetic >>>>> question, it seems like magnetic (HDD) is no longer a recommended sto= rage >>>>> option for databases on AWS. In particular, only the C2 Dense Storage >>>>> instances have local magnetic storage - all the other instance types = are >>>>> SSD or EBS-only - and EBS Magnetic is only recommended for "Infrequen= t Data >>>>> Access." >>>>> >>>>> For the record, that AWS doc has Cassandra listed as a use case for i= 2 >>>>> instance types. >>>>> >>>>> Also, the AWS doc lists EBS io2 for the NoSQL database use case and >>>>> gp2 only for the "small to medium databases" use case. >>>>> >>>>> Do older instances with local HDD still exist on AWS (m1, m2, etc.)? >>>>> Is the doc simply for any newly started instances? >>>>> >>>>> See: >>>>> https://aws.amazon.com/ec2/instance-types/ >>>>> http://aws.amazon.com/ebs/details/ >>>>> >>>>> >>>>> -- Jack Krupansky >>>>> >>>>> On Mon, Feb 1, 2016 at 2:09 PM, Jeff Jirsa >>>> > wrote: >>>>> >>>>>> > My apologies if my questions are actually answered on the video or >>>>>> slides, I just did a quick scan of the slide text. >>>>>> >>>>>> Virtually all of them are covered. >>>>>> >>>>>> > I'm curious where the EBS physical devices actually reside - are >>>>>> they in the same rack, the same data center, same availability zone?= I >>>>>> mean, people try to minimize network latency between nodes, so how e= xactly >>>>>> is EBS able to avoid network latency? >>>>>> >>>>>> Not published,and probably not a straight forward answer (probably >>>>>> have redundancy cross-az, if it matches some of their other publishe= d >>>>>> behaviors). The promise they give you is =E2=80=98iops=E2=80=99, wit= h a certain block size. >>>>>> Some instance types are optimized for dedicated, ebs-only network >>>>>> interfaces. Like most things in cassandra / cloud, the only way to k= now for >>>>>> sure is to test it yourself and see if observed latency is acceptabl= e (or >>>>>> trust our testing, if you assume we=E2=80=99re sufficiently smart an= d honest). >>>>>> >>>>>> > Did your test use Amazon EBS=E2=80=93Optimized Instances? >>>>>> >>>>>> We tested dozens of instance type/size combinations (literally). The >>>>>> best performance was clearly with ebs-optimized instances that also = have >>>>>> enhanced networking (c4, m4, etc) - slide 43 >>>>>> >>>>>> > SSD or magnetic or does it make any difference? >>>>>> >>>>>> SSD, GP2 (slide 64) >>>>>> >>>>>> > What info is available on EBS performance at peak times, when >>>>>> multiple AWS customers have spikes of demand? >>>>>> >>>>>> Not published, but experiments show that we can hit 10k iops all day >>>>>> every day with only trivial noisy neighbor problems, not enough to i= mpact a >>>>>> real cluster (slide 58) >>>>>> >>>>>> > Is RAID much of a factor or help at all using EBS? >>>>>> >>>>>> You can use RAID to get higher IOPS than you=E2=80=99d normally get = by >>>>>> default (GP2 IOPS cap is 10k, which you get with a 3.333T volume =E2= =80=93 if you >>>>>> need more than 10k, you can stripe volumes together up to the ebs ne= twork >>>>>> link max) (hinted at in slide 64) >>>>>> >>>>>> > How exactly is EBS provisioned in terms of its own HA - I mean, >>>>>> with a properly configured Cassandra cluster RF provides HA, so what= is the >>>>>> equivalent for EBS? If I have RF=3D3, what assurance is there that t= hose >>>>>> three EBS volumes aren't all in the same physical rack? >>>>>> >>>>>> There is HA, I=E2=80=99m not sure that AWS publishes specifics. Occa= sionally >>>>>> specific volumes will have issues (hypervisor=E2=80=99s dedicated et= hernet link to >>>>>> EBS network fails, for example). Occasionally instances will have is= sues. >>>>>> The volume-specific issues seem to be less common than the instance-= store >>>>>> =E2=80=9Cinstance retired=E2=80=9D or =E2=80=9Cinstance is running o= n degraded hardware=E2=80=9D events. >>>>>> Stop/Start and you=E2=80=99ve recovered (possible with EBS, not poss= ible with >>>>>> instance store). The assurances are in AWS=E2=80=99 SLA =E2=80=93 if= the SLA is >>>>>> insufficient (and it probably is insufficient), use more than one AZ= and/or >>>>>> AWS region or cloud vendor. >>>>>> >>>>>> > For multi-data center operation, what configuration options assure >>>>>> that the EBS volumes for each DC are truly physically separated? >>>>>> >>>>>> It used to be true that EBS control plane for a given region spanned >>>>>> AZs. That=E2=80=99s no longer true. AWS asserts that failure modes f= or each AZ are >>>>>> isolated (data may replicate between AZs, but a full outage in us-ea= st-1a >>>>>> shouldn=E2=80=99t affect running ebs volumes in us-east-1b or us-eas= t-1c). Slide 65 >>>>>> >>>>>> > In terms of syncing data for the commit log, if the OS call to syn= c >>>>>> an EBS volume returns, is the commit log data absolutely 100% synced= at the >>>>>> hardware level on the EBS end, such that a power failure of the syst= ems on >>>>>> which the EBS volumes reside will still guarantee availability of th= e >>>>>> fsynced data. As well, is return from fsync an absolute guarantee of >>>>>> sstable durability when Cassandra is about to delete the commit log, >>>>>> including when the two are on different volumes? In practice, we wou= ld like >>>>>> some significant degree of pipelining of data, such as during the fu= ll >>>>>> processing of flushing memtables, but for the fsync at the end a sol= id >>>>>> guarantee is needed. >>>>>> >>>>>> Most of the answers in this block are =E2=80=9Cprobably not 100%, yo= u should >>>>>> be writing to more than one host/AZ/DC/vendor to protect your organi= zation >>>>>> from failures=E2=80=9D. AWS targets something like 0.1% annual failu= re rate per >>>>>> volume and 99.999% availability (slide 66). We believe they=E2=80=99= re exceeding >>>>>> those goals (at least based with the petabytes of data we have on gp= 2 >>>>>> volumes). >>>>>> >>>>>> >>>>>> >>>>>> From: Jack Krupansky >>>>>> Reply-To: "user@cassandra.apache.org" >>>>>> Date: Monday, February 1, 2016 at 5:51 AM >>>>>> >>>>>> To: "user@cassandra.apache.org" >>>>>> Subject: Re: EC2 storage options for C* >>>>>> >>>>>> I'm not a fan of guy - this appears to be the slideshare >>>>>> corresponding to the video: >>>>>> >>>>>> http://www.slideshare.net/AmazonWebServices/bdt323-amazon-ebs-cassan= dra-1-million-writes-per-second >>>>>> >>>>>> My apologies if my questions are actually answered on the video or >>>>>> slides, I just did a quick scan of the slide text. >>>>>> >>>>>> I'm curious where the EBS physical devices actually reside - are the= y >>>>>> in the same rack, the same data center, same availability zone? I me= an, >>>>>> people try to minimize network latency between nodes, so how exactly= is EBS >>>>>> able to avoid network latency? >>>>>> >>>>>> Did your test use Amazon EBS=E2=80=93Optimized Instances? >>>>>> >>>>>> SSD or magnetic or does it make any difference? >>>>>> >>>>>> What info is available on EBS performance at peak times, when >>>>>> multiple AWS customers have spikes of demand? >>>>>> >>>>>> Is RAID much of a factor or help at all using EBS? >>>>>> >>>>>> How exactly is EBS provisioned in terms of its own HA - I mean, with >>>>>> a properly configured Cassandra cluster RF provides HA, so what is t= he >>>>>> equivalent for EBS? If I have RF=3D3, what assurance is there that t= hose >>>>>> three EBS volumes aren't all in the same physical rack? >>>>>> >>>>>> For multi-data center operation, what configuration options assure >>>>>> that the EBS volumes for each DC are truly physically separated? >>>>>> >>>>>> In terms of syncing data for the commit log, if the OS call to sync >>>>>> an EBS volume returns, is the commit log data absolutely 100% synced= at the >>>>>> hardware level on the EBS end, such that a power failure of the syst= ems on >>>>>> which the EBS volumes reside will still guarantee availability of th= e >>>>>> fsynced data. As well, is return from fsync an absolute guarantee of >>>>>> sstable durability when Cassandra is about to delete the commit log, >>>>>> including when the two are on different volumes? In practice, we wou= ld like >>>>>> some significant degree of pipelining of data, such as during the fu= ll >>>>>> processing of flushing memtables, but for the fsync at the end a sol= id >>>>>> guarantee is needed. >>>>>> >>>>>> >>>>>> -- Jack Krupansky >>>>>> >>>>>> On Mon, Feb 1, 2016 at 12:56 AM, Eric Plowe >>>>>> wrote: >>>>>> >>>>>>> Jeff, >>>>>>> >>>>>>> If EBS goes down, then EBS Gp2 will go down as well, no? I'm not >>>>>>> discounting EBS, but prior outages are worrisome. >>>>>>> >>>>>>> >>>>>>> On Sunday, January 31, 2016, Jeff Jirsa >>>>>>> wrote: >>>>>>> >>>>>>>> Free to choose what you'd like, but EBS outages were also addresse= d >>>>>>>> in that video (second half, discussion by Dennis Opacki). 2016 EBS= isn't >>>>>>>> the same as 2011 EBS. >>>>>>>> >>>>>>>> -- >>>>>>>> Jeff Jirsa >>>>>>>> >>>>>>>> >>>>>>>> On Jan 31, 2016, at 8:27 PM, Eric Plowe >>>>>>>> wrote: >>>>>>>> >>>>>>>> Thank you all for the suggestions. I'm torn between GP2 vs >>>>>>>> Ephemeral. GP2 after testing is a viable contender for our workloa= d. The >>>>>>>> only worry I have is EBS outages, which have happened. >>>>>>>> >>>>>>>> On Sunday, January 31, 2016, Jeff Jirsa >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Also in that video - it's long but worth watching >>>>>>>>> >>>>>>>>> We tested up to 1M reads/second as well, blowing out page cache t= o >>>>>>>>> ensure we weren't "just" reading from memory >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Jeff Jirsa >>>>>>>>> >>>>>>>>> >>>>>>>>> On Jan 31, 2016, at 9:52 AM, Jack Krupansky < >>>>>>>>> jack.krupansky@gmail.com> wrote: >>>>>>>>> >>>>>>>>> How about reads? Any differences between read-intensive and >>>>>>>>> write-intensive workloads? >>>>>>>>> >>>>>>>>> -- Jack Krupansky >>>>>>>>> >>>>>>>>> On Sun, Jan 31, 2016 at 3:13 AM, Jeff Jirsa < >>>>>>>>> jeff.jirsa@crowdstrike.com> wrote: >>>>>>>>> >>>>>>>>>> Hi John, >>>>>>>>>> >>>>>>>>>> We run using 4T GP2 volumes, which guarantee 10k iops. Even at 1= M >>>>>>>>>> writes per second on 60 nodes, we didn=E2=80=99t come close to h= itting even 50% >>>>>>>>>> utilization (10k is more than enough for most workloads). PIOPS = is not >>>>>>>>>> necessary. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> From: John Wong >>>>>>>>>> Reply-To: "user@cassandra.apache.org" >>>>>>>>>> Date: Saturday, January 30, 2016 at 3:07 PM >>>>>>>>>> To: "user@cassandra.apache.org" >>>>>>>>>> Subject: Re: EC2 storage options for C* >>>>>>>>>> >>>>>>>>>> For production I'd stick with ephemeral disks (aka instance >>>>>>>>>> storage) if you have running a lot of transaction. >>>>>>>>>> However, for regular small testing/qa cluster, or something you >>>>>>>>>> know you want to reload often, EBS is definitely good enough and= we haven't >>>>>>>>>> had issues 99%. The 1% is kind of anomaly where we have flush bl= ocked. >>>>>>>>>> >>>>>>>>>> But Jeff, kudo that you are able to use EBS. I didn't go through >>>>>>>>>> the video, do you actually use PIOPS or just standard GP2 in you= r >>>>>>>>>> production cluster? >>>>>>>>>> >>>>>>>>>> On Sat, Jan 30, 2016 at 1:28 PM, Bryan Cheng < >>>>>>>>>> bryan@blockcypher.com> wrote: >>>>>>>>>> >>>>>>>>>>> Yep, that motivated my question "Do you have any idea what kind >>>>>>>>>>> of disk performance you need?". If you need the performance, it= s hard to >>>>>>>>>>> beat ephemeral SSD in RAID 0 on EC2, and its a solid, battle te= sted >>>>>>>>>>> configuration. If you don't, though, EBS GP2 will save a _lot_ = of headache. >>>>>>>>>>> >>>>>>>>>>> Personally, on small clusters like ours (12 nodes), we've found >>>>>>>>>>> our choice of instance dictated much more by the balance of pri= ce, CPU, and >>>>>>>>>>> memory. We're using GP2 SSD and we find that for our patterns t= he disk is >>>>>>>>>>> rarely the bottleneck. YMMV, of course. >>>>>>>>>>> >>>>>>>>>>> On Fri, Jan 29, 2016 at 7:32 PM, Jeff Jirsa < >>>>>>>>>>> jeff.jirsa@crowdstrike.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> If you have to ask that question, I strongly recommend m4 or c= 4 >>>>>>>>>>>> instances with GP2 EBS. When you don=E2=80=99t care about rep= lacing a node because >>>>>>>>>>>> of an instance failure, go with i2+ephemerals. Until then, GP2= EBS is >>>>>>>>>>>> capable of amazing things, and greatly simplifies life. >>>>>>>>>>>> >>>>>>>>>>>> We gave a talk on this topic at both Cassandra Summit and AWS >>>>>>>>>>>> re:Invent: https://www.youtube.com/watch?v=3D1R-mgOcOSd4 It=E2= =80=99s >>>>>>>>>>>> very much a viable option, despite any old documents online th= at say >>>>>>>>>>>> otherwise. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> From: Eric Plowe >>>>>>>>>>>> Reply-To: "user@cassandra.apache.org" >>>>>>>>>>>> Date: Friday, January 29, 2016 at 4:33 PM >>>>>>>>>>>> To: "user@cassandra.apache.org" >>>>>>>>>>>> Subject: EC2 storage options for C* >>>>>>>>>>>> >>>>>>>>>>>> My company is planning on rolling out a C* cluster in EC2. We >>>>>>>>>>>> are thinking about going with ephemeral SSDs. The question is = this: Should >>>>>>>>>>>> we put two in RAID 0 or just go with one? We currently run a c= luster in our >>>>>>>>>>>> data center with 2 250gig Samsung 850 EVO's in RAID 0 and we a= re happy with >>>>>>>>>>>> the performance we are seeing thus far. >>>>>>>>>>>> >>>>>>>>>>>> Thanks! >>>>>>>>>>>> >>>>>>>>>>>> Eric >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Steve Robenalt >>>> Software Architect >>>> srobenalt@highwire.org >>>> (office/cell): 916-505-1785 >>>> >>>> HighWire Press, Inc. >>>> 425 Broadway St, Redwood City, CA 94063 >>>> www.highwire.org >>>> >>>> Technology for Scholarly Communication >>>> >>> >>> >> -- > Ben Bromhead > CTO | Instaclustr > +1 650 284 9692 > --001a113dc78ed88362052ae68dd9 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Just curious here ... when did EBS become OK for C*? Didn&= #39;t they always push towards using ephemeral disks?

On Wed, Feb 3, 2016 at 12:1= 7 PM, Ben Bromhead <ben@instaclustr.com> wrote:
For what it's worth we've = tried d2 instances and they encourage terrible things like super dense node= s (increases your replacement time). In terms of useable storage I would go= with=C2=A0gp2=C2=A0EBS on a m4 based instance.=C2=A0

On Mon, 1 Feb 2016 at 14:25 Jack Krupansk= y <jack.kr= upansky@gmail.com> wrote:
Ah, yes, the good old days of m1.large.

-- Jack Krupansky

On Mon, Feb 1, 2016 at 5:12 PM, Jeff Jirsa <= span dir=3D"ltr"><jeff.jirsa@crowdstrike.com> wrote:
A lot of people use t= he old gen instances (m1 in particular) because they came with a ton of eff= ectively free ephemeral storage (up to 1.6TB). Whether or not they=E2=80=99= re viable is a decision for each user to make. They=E2=80=99re very, very c= ommonly used for C*, though. At a time when EBS was not sufficiently robust= or reliable, a cluster of m1 instances was the de facto standard.=C2=A0

The canonical =E2=80=9Cbest practice=E2=80=9D in 201= 5 was i2. We believe we=E2=80=99ve made a compelling argument to use m4 or = c4 instead of i2. There exists a company we know currently testing d2 at sc= ale, though I=E2=80=99m not sure they have much in terms of concrete result= s at this time.=C2=A0

- Jeff
=

From: Jack Krupansky
Reply-To: "user@cassandra.apache.org"
Date: Monday, February 1, 2016 at 1:5= 5 PM

To: "user@cassandra.apa= che.org"
Subject: Re: = EC2 storage options for C*

Thanks. My typo - I referenced "C2 Dense Stor= age" which is really "D2 Dense Storage".

The remaining question is whether any of the "Prev= ious Generation Instances" should be publicly recommended going forwar= d.

And whether non-SSD instances should be recomme= nded going forward as well. sure, technically, someone could use the legacy= instances, but the question is what we should be recommending as best prac= tice going forward.

Yeah, the i2 instances look li= ke the sweet spot for any non-EBS clusters.

-- Jack Krupansky
=

On Mon, Feb 1, 2016 at 4:30 PM, = Steve Robenalt <srobenalt@h= ighwire.org> wrote:
Hi Jack,

At the bottom of the instance-types page, there is a li= nk to the previous generations, which includes the older series (m1, m2, et= c), many of which have HDD options.=C2=A0

There ar= e also the d2 (Dense Storage) instances in the current generation that incl= ude various combos of local HDDs.

The i2 series ha= s good sized SSDs available, and has the advanced networking option, which = is also useful for Cassandra. The enhanced networking is available with oth= er instance types as well, as you'll see on the feature list under each= type.=C2=A0

Steve



On Mon, Feb 1, 2016 at 1:17 PM, Jack Krupansky <jack.krup= ansky@gmail.com> wrote:
Thanks. Reading a little bit on AWS, and back to my SSD vs. m= agnetic question, it seems like magnetic (HDD) is no longer a recommended s= torage option for databases on AWS. In particular, only the C2 Dense Storag= e instances have local magnetic storage - all the other instance types are SSD or EBS-only - and EBS Magne= tic is only recommended for "Infrequent Data Access."

For the record, that AWS doc has Cassandra listed = as a use case for i2 instance types.

Also, the AWS= doc lists EBS io2 for the NoSQL database use case and gp2 only for the &qu= ot;small to medium databases" use case.

Do ol= der instances with local HDD still exist on AWS (m1, m2, etc.)? Is the doc = simply for any newly started instances?

See:
=


=
-- Jack Krupansky

On Mon, Feb 1, 2016 at 2:09 PM, Jeff Jirsa = <jeff.ji= rsa@crowdstrike.com> wrote:
> My apologies if my questions = are actually answered on the video or slides, I just did a quick scan of th= e slide text.

Virtually all of them are cov= ered.

> I'm curious where the EBS phy= sical devices actually reside - are they in the same rack, the same data ce= nter, same availability zone? I mean, people try to minimize network latenc= y between nodes, so how exactly is EBS able to avoid network latency?
=

Not published,and probably not a straight forwar= d answer (probably have redundancy cross-az, if it matches some of their ot= her published behaviors). The promise they give you is =E2=80=98iops=E2=80= =99, with a certain block size. Some instance types are optimized for dedic= ated, ebs-only network interfaces. Like most things in cassandra / cloud, the on= ly way to know for sure is to test it yourself and see if observed latency = is acceptable (or trust our testing, if you assume we=E2=80=99re sufficient= ly smart and honest).=C2=A0

> Did your te= st use=C2=A0Amazon EBS=E2=80=93Optimized Instances?

We tested dozens of instance type/size combinations (literally). T= he best performance was clearly with ebs-optimized instances that also have= enhanced networking (c4, m4, etc) - slide 43

> SSD or magnetic or does it make any difference?

=
SSD, GP2 (slide 64)

> What in= fo is available on EBS performance at peak times, when multiple AWS custome= rs have spikes of demand?

Not published, bu= t experiments show that we can hit 10k iops all day every day with only tri= vial noisy neighbor problems, not enough to impact a real cluster (slide 58= )

> Is RAID much of a factor or help at a= ll using EBS?

You can use RAID to get highe= r IOPS than you=E2=80=99d normally get by default (GP2 IOPS cap is 10k, whi= ch you get with a 3.333T volume =E2=80=93 if you need more than 10k, you ca= n stripe volumes together up to the ebs network link max) (hinted at in sli= de 64)

> How exactly is EBS provisioned i= n terms of its own HA - I mean, with a properly configured Cassandra cluste= r RF provides HA, so what is the equivalent for EBS? If I have RF=3D3, what= assurance is there that those three EBS volumes aren't all in the same physical rack?

There is HA, I=E2=80=99m no= t sure that AWS publishes specifics. Occasionally specific volumes will hav= e issues (hypervisor=E2=80=99s dedicated ethernet link to EBS network fails= , for example). Occasionally instances will have issues. The volume-specifi= c issues seem to be less common than the instance-store =E2=80=9Cinstance retired=E2=80=9D = or =E2=80=9Cinstance is running on degraded hardware=E2=80=9D events. Stop/= Start and you=E2=80=99ve recovered (possible with EBS, not possible with in= stance store). The assurances are in AWS=E2=80=99 SLA =E2=80=93 if the SLA = is insufficient (and it probably is insufficient), use more than one AZ and/or AWS region = or cloud vendor.

> For multi-data center = operation, what configuration options assure that the EBS volumes for each = DC are truly physically separated?

It used = to be true that EBS control plane for a given region spanned AZs. That=E2= =80=99s no longer true. AWS asserts that failure modes for each AZ are isol= ated (data may replicate between AZs, but a full outage in us-east-1a shoul= dn=E2=80=99t affect running ebs volumes in us-east-1b or us-east-1c). Slide 65

>= In terms of syncing data for the commit log, if the OS call to sync an EBS= volume returns, is the commit log data absolutely 100% synced at the hardw= are level on the EBS end, such that a power failure of the systems on which= the EBS volumes reside will still guarantee availability of the fsynced data. As well, is return from = fsync an absolute guarantee of sstable durability when Cassandra is about t= o delete the commit log, including when the two are on different volumes? I= n practice, we would like some significant degree of pipelining of data, such as during the full processing of flushi= ng memtables, but for the fsync at the end a solid guarantee is needed.

Most of the answers in this block are =E2=80= =9Cprobably not 100%, you should be writing to more than one host/AZ/DC/ven= dor to protect your organization from failures=E2=80=9D. AWS targets someth= ing like 0.1% annual failure rate per volume and 99.999% availability (slid= e 66). We believe they=E2=80=99re exceeding those goals (at least based with= the petabytes of data we have on gp2 volumes). =C2=A0

=


From: Jack Krupans= ky
Reply-To: "user@cassandra.apache.org= "
Date: Monday, Februar= y 1, 2016 at 5:51 AM

To: "user@cassandra.apache.o= rg"
Subject: Re: EC2 st= orage options for C*

I'm not a fan of guy - this appears to be the slides= hare corresponding to the video:

My apologies if my questions are act= ually answered on the video or slides, I just did a quick scan of the slide= text.

I'm curious where the EBS physical devi= ces actually reside - are they in the same rack, the same data center, same= availability zone? I mean, people try to minimize network latency between = nodes, so how exactly is EBS able to avoid network latency?

Did your test use=C2=A0Amazon EBS=E2=80=93Optimized Ins= tances?

SSD or magnetic or does it make any differ= ence?

What info is available on EBS performance at= peak times, when multiple AWS customers have spikes of demand?
<= br>
Is RAID much of a factor or help at all using EBS?
=
How exactly is EBS provisioned in terms of its own HA - I me= an, with a properly configured Cassandra cluster RF provides HA, so what is= the equivalent for EBS? If I have RF=3D3, what assurance is there that tho= se three EBS volumes aren't all in the same physical rack?

For multi-data center operat= ion, what configuration options assure that the EBS volumes for each DC are= truly physically separated?

In terms of syncing d= ata for the commit log, if the OS call to sync an EBS volume returns, is th= e commit log data absolutely 100% synced at the hardware level on the EBS e= nd, such that a power failure of the systems on which the EBS volumes resid= e will still guarantee availability of the fsynced data. As well, is return from = fsync an absolute guarantee of sstable durability when Cassandra is about t= o delete the commit log, including when the two are on different volumes? I= n practice, we would like some significant degree of pipelining of data, such as during the full processing of flushi= ng memtables, but for the fsync at the end a solid guarantee is needed.


-- Jack Krupansky

On Mon, Feb 1, 2016 at 12:56 AM, Eric Plowe <eric.plowe@gm= ail.com> wrote:
Jeff,

If EBS goes down, then EBS Gp2 will go down as well, no= ? I'm not discounting EBS, but prior outages are worrisome.


On Sunday, January 31, 2016, Jeff Jirsa <jeff.jirsa@crowdstrike.com> wrote:<= br>
Free to choose wha= t you'd like, but EBS outages were also addressed in that video (second= half, discussion by Dennis Opacki). 2016 EBS isn't the same as 2011 EB= S.=C2=A0

--=C2=A0
Jeff Jirsa


On Jan 31, 2016, at 8:27 PM, Eric Plowe <eric.plowe@gmail.com>= wrote:

Thank you all for the s= uggestions. I'm torn between GP2 vs Ephemeral. GP2 after testing is a v= iable contender for our workload. The only worry I have is EBS outages, whi= ch have happened.=C2=A0

On Sunday, January 31, 2016, Jeff Jirsa <jeff.jirsa@crowdstrike.com> wrote:
Also= in that video - it's long but worth watching

= We tested up to 1M reads/second as well, blowing out page cache to ensure w= e weren't "just" reading from memory



--=C2=A0
Jeff Jirsa

How about reads? Any differences between read-intensive and write-intensi= ve workloads?

<= div dir=3D"ltr">-- Jack Krupansky

On Sun, Jan 31, 2016 at 3:13 AM, Jeff Jirsa <jeff.jirsa@crowdstrike.com> wrote:
Hi John,

We run using 4T GP2 volumes, which guarantee 10k iops. Even at 1M w= rites per second on 60 nodes, we didn=E2=80=99t come close to hitting even = 50% utilization (10k is more than enough for most workloads). PIOPS is not = necessary.=C2=A0


<= /div>

From: John Wong
Reply= -To: "user@cassandra.apache.org"
Date: Saturday, January 30, 2016 at 3:07 PM
To: "user@cassandra.apache.org= "
Subject: Re: EC2 stor= age options for C*

For production I'd stick with ephemeral disks (aka instance storag= e) if you have running a lot of transaction.
However, for regular small testing/qa cluster, or something you know y= ou want to reload often, EBS is definitely good enough and we haven't h= ad issues 99%. The 1% is kind of anomaly where we have flush blocked.
=

But Jeff, kudo that you are able to use EBS. I didn'= ;t go through the video, do you actually use PIOPS or just standard GP2 in = your production cluster?

On Sat, Jan 30, 2016 at 1:28 PM, Bryan Cheng <bryan@blockcypher.com> wrote:
Yep, that motivated my question "Do you have any idea what kind of disk performance y= ou need?". If you need the performance, its hard to beat ephemeral SSD= in RAID 0 on EC2, and its a solid, battle tested configuration. If you don't, though, EBS GP2 will save a _lot_ of headache.

Personally, on small clusters like ours (12 nodes), we&#= 39;ve found our choice of instance dictated much more by the balance of pri= ce, CPU, and memory. We're using GP2 SSD and we find that for our patte= rns the disk is rarely the bottleneck. YMMV, of course.

On Fri, Jan 29, 2016 at 7:32 P= M, Jeff Jirsa <jeff.jirsa@crowdstrike.com> wrote:
If you have to ask that qu= estion, I strongly recommend m4 or c4 instances with GP2 EBS.=C2=A0 When yo= u don=E2=80=99t care about replacing a node because of an instance failure,= go with i2+ephemerals. Until then, GP2 EBS is capable of amazing things, a= nd greatly simplifies life.

We gave a talk on this topic at both Cassan= dra Summit and AWS re:Invent:=C2=A0https://www.youtube.com/watch?v=3D1R-mg= OcOSd4=C2=A0It=E2=80=99s very much a viable option, despite any old doc= uments online that say otherwise.



From: Eric Plowe
= Reply-To: "user@cassandra.apache.org"
Date: Friday, January 29, 2016 at 4:33 PM
= To: "user@cassandra.apache.= org"
Subject: EC2 stora= ge options for C*

My company is=C2= =A0planning on rolling out a C* cluster in EC2. We are thinking about going= with ephemeral SSDs. The question is this:=C2=A0Should we put two in RAID = 0 or just go with one? We currently run a cluster in our data center with 2= 250gig Samsung 850 EVO's in=C2=A0RAID 0=C2=A0and we are happy with the performance we are seeing th= us far.

Thanks!

Er= ic

=



<= /div>




--
Steve Robenalt=C2=A0
Software Architect
=
(office/cell): 916-505-1785

High= Wire Press, Inc.
425 Broadway St, Redwood City, CA 94063

Technology for Scholarly Communication

--
Ben= Bromhead
CTO | Instaclustr

--001a113dc78ed88362052ae68dd9--