cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <jack.krupan...@gmail.com>
Subject Re: EC2 storage options for C*
Date Mon, 01 Feb 2016 13:51:35 GMT
I'm not a fan of guy - this appears to be the slideshare corresponding to
the video:
http://www.slideshare.net/AmazonWebServices/bdt323-amazon-ebs-cassandra-1-million-writes-per-second

My apologies if my questions are actually answered on the video or slides,
I just did a quick scan of the slide text.

I'm curious where the EBS physical devices actually reside - are they in
the same rack, the same data center, same availability zone? I mean, people
try to minimize network latency between nodes, so how exactly is EBS able
to avoid network latency?

Did your test use Amazon EBS–Optimized Instances?

SSD or magnetic or does it make any difference?

What info is available on EBS performance at peak times, when multiple AWS
customers have spikes of demand?

Is RAID much of a factor or help at all using EBS?

How exactly is EBS provisioned in terms of its own HA - I mean, with a
properly configured Cassandra cluster RF provides HA, so what is the
equivalent for EBS? If I have RF=3, what assurance is there that those
three EBS volumes aren't all in the same physical rack?

For multi-data center operation, what configuration options assure that the
EBS volumes for each DC are truly physically separated?

In terms of syncing data for the commit log, if the OS call to sync an EBS
volume returns, is the commit log data absolutely 100% synced at the
hardware level on the EBS end, such that a power failure of the systems on
which the EBS volumes reside will still guarantee availability of the
fsynced data. As well, is return from fsync an absolute guarantee of
sstable durability when Cassandra is about to delete the commit log,
including when the two are on different volumes? In practice, we would like
some significant degree of pipelining of data, such as during the full
processing of flushing memtables, but for the fsync at the end a solid
guarantee is needed.


-- Jack Krupansky

On Mon, Feb 1, 2016 at 12:56 AM, Eric Plowe <eric.plowe@gmail.com> wrote:

> Jeff,
>
> If EBS goes down, then EBS Gp2 will go down as well, no? I'm not
> discounting EBS, but prior outages are worrisome.
>
>
> On Sunday, January 31, 2016, Jeff Jirsa <jeff.jirsa@crowdstrike.com>
> wrote:
>
>> Free to choose what you'd like, but EBS outages were also addressed in
>> that video (second half, discussion by Dennis Opacki). 2016 EBS isn't the
>> same as 2011 EBS.
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Jan 31, 2016, at 8:27 PM, Eric Plowe <eric.plowe@gmail.com> wrote:
>>
>> Thank you all for the suggestions. I'm torn between GP2 vs Ephemeral. GP2
>> after testing is a viable contender for our workload. The only worry I have
>> is EBS outages, which have happened.
>>
>> On Sunday, January 31, 2016, Jeff Jirsa <jeff.jirsa@crowdstrike.com>
>> wrote:
>>
>>> Also in that video - it's long but worth watching
>>>
>>> We tested up to 1M reads/second as well, blowing out page cache to
>>> ensure we weren't "just" reading from memory
>>>
>>>
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> On Jan 31, 2016, at 9:52 AM, Jack Krupansky <jack.krupansky@gmail.com>
>>> wrote:
>>>
>>> How about reads? Any differences between read-intensive and
>>> write-intensive workloads?
>>>
>>> -- Jack Krupansky
>>>
>>> On Sun, Jan 31, 2016 at 3:13 AM, Jeff Jirsa <jeff.jirsa@crowdstrike.com>
>>> wrote:
>>>
>>>> Hi John,
>>>>
>>>> We run using 4T GP2 volumes, which guarantee 10k iops. Even at 1M
>>>> writes per second on 60 nodes, we didn’t come close to hitting even 50%
>>>> utilization (10k is more than enough for most workloads). PIOPS is not
>>>> necessary.
>>>>
>>>>
>>>>
>>>> From: John Wong
>>>> Reply-To: "user@cassandra.apache.org"
>>>> Date: Saturday, January 30, 2016 at 3:07 PM
>>>> To: "user@cassandra.apache.org"
>>>> Subject: Re: EC2 storage options for C*
>>>>
>>>> For production I'd stick with ephemeral disks (aka instance storage) if
>>>> you have running a lot of transaction.
>>>> However, for regular small testing/qa cluster, or something you know
>>>> you want to reload often, EBS is definitely good enough and we haven't had
>>>> issues 99%. The 1% is kind of anomaly where we have flush blocked.
>>>>
>>>> But Jeff, kudo that you are able to use EBS. I didn't go through the
>>>> video, do you actually use PIOPS or just standard GP2 in your production
>>>> cluster?
>>>>
>>>> On Sat, Jan 30, 2016 at 1:28 PM, Bryan Cheng <bryan@blockcypher.com>
>>>> wrote:
>>>>
>>>>> Yep, that motivated my question "Do you have any idea what kind of
>>>>> disk performance you need?". If you need the performance, its hard to
beat
>>>>> ephemeral SSD in RAID 0 on EC2, and its a solid, battle tested
>>>>> configuration. If you don't, though, EBS GP2 will save a _lot_ of headache.
>>>>>
>>>>> Personally, on small clusters like ours (12 nodes), we've found our
>>>>> choice of instance dictated much more by the balance of price, CPU, and
>>>>> memory. We're using GP2 SSD and we find that for our patterns the disk
is
>>>>> rarely the bottleneck. YMMV, of course.
>>>>>
>>>>> On Fri, Jan 29, 2016 at 7:32 PM, Jeff Jirsa <
>>>>> jeff.jirsa@crowdstrike.com> wrote:
>>>>>
>>>>>> If you have to ask that question, I strongly recommend m4 or c4
>>>>>> instances with GP2 EBS.  When you don’t care about replacing a
node because
>>>>>> of an instance failure, go with i2+ephemerals. Until then, GP2 EBS
is
>>>>>> capable of amazing things, and greatly simplifies life.
>>>>>>
>>>>>> We gave a talk on this topic at both Cassandra Summit and AWS
>>>>>> re:Invent: https://www.youtube.com/watch?v=1R-mgOcOSd4 It’s very
>>>>>> much a viable option, despite any old documents online that say otherwise.
>>>>>>
>>>>>>
>>>>>>
>>>>>> From: Eric Plowe
>>>>>> Reply-To: "user@cassandra.apache.org"
>>>>>> Date: Friday, January 29, 2016 at 4:33 PM
>>>>>> To: "user@cassandra.apache.org"
>>>>>> Subject: EC2 storage options for C*
>>>>>>
>>>>>> My company is planning on rolling out a C* cluster in EC2. We are
>>>>>> thinking about going with ephemeral SSDs. The question is this: Should
we
>>>>>> put two in RAID 0 or just go with one? We currently run a cluster
in our
>>>>>> data center with 2 250gig Samsung 850 EVO's in RAID 0 and we are
happy with
>>>>>> the performance we are seeing thus far.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Eric
>>>>>>
>>>>>
>>>>>
>>>>
>>>

Mime
View raw message