incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: cassandra vs. mongodb quick question(good additional info)
Date Thu, 21 Feb 2013 23:26:14 GMT
The theoretical maximum of 10G is not even close to what you actually get.

http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CDIQFjAA&url=http%3A%2F%2Fdownload.intel.com%2Fsupport%2Fnetwork%2Fsb%2Ffedexcasestudyfinal.pdf&ei=HawmUcWIM6q20QG8j4DIBw&usg=AFQjCNG8Qskl9vXdJvB7OLtIPQgparrt9A&bvm=bv.42661473,d.dmQ&cad=rja

Sorry did not have time to strip the google stuff out of this link.


On Thu, Feb 21, 2013 at 12:45 PM, aaron morton <aaron@thelastpickle.com> wrote:
> If you are lazy like me wolfram alpha can help
>
> http://www.wolframalpha.com/input/?i=transfer+42TB+at+10GbE&a=UnitClash_*TB.*Tebibytes--
>
> 10 hours 15 minutes 43.59 seconds
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 21/02/2013, at 11:31 AM, Wojciech Meler <wojciech.meler@gmail.com> wrote:
>
> you have 86400 seconds a day so 42T could take less than 12 hours on 10Gb
> link
>
> 19 lut 2013 02:01, "Hiller, Dean" <Dean.Hiller@nrel.gov> napisał(a):
>>
>> I thought about this more, and even with a 10Gbit network, it would take
>> 40 days to bring up a replacement node if mongodb did truly have a 42T /
>> node like I had heard.  I wrote the below email to the person I heard this
>> from going back to basics which really puts some perspective on it….(and a
>> lot of people don't even have a 10Gbit network like we do)
>>
>> Nodes are hooked up by a 10G network at most right now where that is
>> 10gigabit.  We are talking about 10Terabytes on disk per node recently.
>>
>> Google "10 gigabit in gigabytes" gives me 1.25 gigabytes/second  (yes I
>> could have divided by 8 in my head but eh…course when I saw the number, I
>> went duh)
>>
>> So trying to transfer 10 Terabytes  or 10,000 Gigabytes to a node that we
>> are bringing online to replace a dead node would take approximately 5
>> days???
>>
>> This means no one else is using the bandwidth too ;).  10,000Gigabytes * 1
>> second/1.25 * 1hr/60secs * 1 day / 24 hrs = 5.555555 days.  This is more
>> likely 11 days if we only use 50% of the network.
>>
>> So bringing a new node up to speed is more like 11 days once it is
>> crashed.  I think this is the main reason the 1Terabyte exists to begin
>> with, right?
>>
>> From an ops perspective, this could sound like a nightmare scenario of
>> waiting 10 days…..maybe it is livable though.  Either way, I thought it
>> would be good to share the numbers.  ALSO, that is assuming the bus with
>> it's 10 disk can keep up with 10G????  Can it?  What is the limit of
>> throughput on a bus / second on the computers we have as on wikipedia there
>> is a huge variance?
>>
>> What is the rate of the disks too (multiplied by 10 of course)?  Will they
>> keep up with a 10G rate for bringing a new node online?
>>
>> This all comes into play even more so when you want to double the size of
>> your cluster of course as all nodes have to transfer half of what they have
>> to all the new nodes that come online(cassandra actually has a very data
>> center/rack aware topology to transfer data correctly to not use up all
>> bandwidth unecessarily…I am not sure mongodb has that).  Anyways, just food
>> for thought.
>>
>> From: aaron morton
>> <aaron@thelastpickle.com<mailto:aaron@thelastpickle.com>>
>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
>> <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>> Date: Monday, February 18, 2013 1:39 PM
>> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
>> <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>, Vegard
Berget
>> <post@fantasista.no<mailto:post@fantasista.no>>
>> Subject: Re: cassandra vs. mongodb quick question
>>
>> My experience is repair of 300GB compressed data takes longer than 300GB
>> of uncompressed, but I cannot point to an exact number. Calculating the
>> differences is mostly CPU bound and works on the non compressed data.
>>
>> Streaming uses compression (after uncompressing the on disk data).
>>
>> So if you have 300GB of compressed data, take a look at how long repair
>> takes and see if you are comfortable with that. You may also want to test
>> replacing a node so you can get the procedure documented and understand how
>> long it takes.
>>
>> The idea of the soft 300GB to 500GB limit cam about because of a number of
>> cases where people had 1 TB on a single node and they were surprised it took
>> days to repair or replace. If you know how long things may take, and that
>> fits in your operations then go with it.
>>
>> Cheers
>>
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 18/02/2013, at 10:08 PM, Vegard Berget
>> <post@fantasista.no<mailto:post@fantasista.no>> wrote:
>>
>>
>>
>> Just out of curiosity :
>>
>> When using compression, does this affect this one way or another?  Is 300G
>> (compressed) SSTable size, or total size of data?
>>
>> .vegard,
>>
>> ----- Original Message -----
>> From:
>> user@cassandra.apache.org<mailto:user@cassandra.apache.org>
>>
>> To:
>> <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>> Cc:
>>
>> Sent:
>> Mon, 18 Feb 2013 08:41:25 +1300
>> Subject:
>> Re: cassandra vs. mongodb quick question
>>
>>
>> If you have spinning disk and 1G networking and no virtual nodes, I would
>> still say 300G to 500G is a soft limit.
>>
>> If you are using virtual nodes, SSD, JBOD disk configuration or faster
>> networking you may go higher.
>>
>> The limiting factors are the time it take to repair, the time it takes to
>> replace a node, the memory considerations for 100's of millions of rows. If
>> you the performance of those operations is acceptable to you, then go crazy.
>>
>> Cheers
>>
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com<http://www.thelastpickle.com/>
>>
>> On 16/02/2013, at 9:05 AM, "Hiller, Dean"
>> <Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov>> wrote:
>>
>> So I found out mongodb varies their node size from 1T to 42T per node
>> depending on the profile.  So if I was going to be writing a lot but rarely
>> changing rows, could I also use cassandra with a per node size of +20T or is
>> that not advisable?
>>
>> Thanks,
>> Dean
>>
>>
>

Mime
View raw message