incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: cassandra vs. mongodb quick question(good additional info)
Date Thu, 21 Feb 2013 17:45:43 GMT
If you are lazy like me wolfram alpha can help 

http://www.wolframalpha.com/input/?i=transfer+42TB+at+10GbE&a=UnitClash_*TB.*Tebibytes--

10 hours 15 minutes 43.59 seconds

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 21/02/2013, at 11:31 AM, Wojciech Meler <wojciech.meler@gmail.com> wrote:

> you have 86400 seconds a day so 42T could take less than 12 hours on 10Gb link
> 
> 19 lut 2013 02:01, "Hiller, Dean" <Dean.Hiller@nrel.gov> napisał(a):
> I thought about this more, and even with a 10Gbit network, it would take 40 days to bring
up a replacement node if mongodb did truly have a 42T / node like I had heard.  I wrote the
below email to the person I heard this from going back to basics which really puts some perspective
on it….(and a lot of people don't even have a 10Gbit network like we do)
> 
> Nodes are hooked up by a 10G network at most right now where that is 10gigabit.  We are
talking about 10Terabytes on disk per node recently.
> 
> Google "10 gigabit in gigabytes" gives me 1.25 gigabytes/second  (yes I could have divided
by 8 in my head but eh…course when I saw the number, I went duh)
> 
> So trying to transfer 10 Terabytes  or 10,000 Gigabytes to a node that we are bringing
online to replace a dead node would take approximately 5 days???
> 
> This means no one else is using the bandwidth too ;).  10,000Gigabytes * 1 second/1.25
* 1hr/60secs * 1 day / 24 hrs = 5.555555 days.  This is more likely 11 days if we only use
50% of the network.
> 
> So bringing a new node up to speed is more like 11 days once it is crashed.  I think
this is the main reason the 1Terabyte exists to begin with, right?
> 
> From an ops perspective, this could sound like a nightmare scenario of waiting 10 days…..maybe
it is livable though.  Either way, I thought it would be good to share the numbers.  ALSO,
that is assuming the bus with it's 10 disk can keep up with 10G????  Can it?  What is the
limit of throughput on a bus / second on the computers we have as on wikipedia there is a
huge variance?
> 
> What is the rate of the disks too (multiplied by 10 of course)?  Will they keep up with
a 10G rate for bringing a new node online?
> 
> This all comes into play even more so when you want to double the size of your cluster
of course as all nodes have to transfer half of what they have to all the new nodes that come
online(cassandra actually has a very data center/rack aware topology to transfer data correctly
to not use up all bandwidth unecessarily…I am not sure mongodb has that).  Anyways, just
food for thought.
> 
> From: aaron morton <aaron@thelastpickle.com<mailto:aaron@thelastpickle.com>>
> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Date: Monday, February 18, 2013 1:39 PM
> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>,
Vegard Berget <post@fantasista.no<mailto:post@fantasista.no>>
> Subject: Re: cassandra vs. mongodb quick question
> 
> My experience is repair of 300GB compressed data takes longer than 300GB of uncompressed,
but I cannot point to an exact number. Calculating the differences is mostly CPU bound and
works on the non compressed data.
> 
> Streaming uses compression (after uncompressing the on disk data).
> 
> So if you have 300GB of compressed data, take a look at how long repair takes and see
if you are comfortable with that. You may also want to test replacing a node so you can get
the procedure documented and understand how long it takes.
> 
> The idea of the soft 300GB to 500GB limit cam about because of a number of cases where
people had 1 TB on a single node and they were surprised it took days to repair or replace.
If you know how long things may take, and that fits in your operations then go with it.
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 18/02/2013, at 10:08 PM, Vegard Berget <post@fantasista.no<mailto:post@fantasista.no>>
wrote:
> 
> 
> 
> Just out of curiosity :
> 
> When using compression, does this affect this one way or another?  Is 300G (compressed)
SSTable size, or total size of data?
> 
> .vegard,
> 
> ----- Original Message -----
> From:
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>
> 
> To:
> <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Cc:
> 
> Sent:
> Mon, 18 Feb 2013 08:41:25 +1300
> Subject:
> Re: cassandra vs. mongodb quick question
> 
> 
> If you have spinning disk and 1G networking and no virtual nodes, I would still say 300G
to 500G is a soft limit.
> 
> If you are using virtual nodes, SSD, JBOD disk configuration or faster networking you
may go higher.
> 
> The limiting factors are the time it take to repair, the time it takes to replace a node,
the memory considerations for 100's of millions of rows. If you the performance of those operations
is acceptable to you, then go crazy.
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com<http://www.thelastpickle.com/>
> 
> On 16/02/2013, at 9:05 AM, "Hiller, Dean" <Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov>>
wrote:
> 
> So I found out mongodb varies their node size from 1T to 42T per node depending on the
profile.  So if I was going to be writing a lot but rarely changing rows, could I also use
cassandra with a per node size of +20T or is that not advisable?
> 
> Thanks,
> Dean
> 
> 


Mime
View raw message