Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 26F44EF81 for ; Wed, 20 Feb 2013 22:31:59 +0000 (UTC) Received: (qmail 25227 invoked by uid 500); 20 Feb 2013 22:31:56 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 25173 invoked by uid 500); 20 Feb 2013 22:31:56 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 25164 invoked by uid 99); 20 Feb 2013 22:31:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Feb 2013 22:31:56 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of wojciech.meler@gmail.com designates 209.85.212.46 as permitted sender) Received: from [209.85.212.46] (HELO mail-vb0-f46.google.com) (209.85.212.46) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Feb 2013 22:31:52 +0000 Received: by mail-vb0-f46.google.com with SMTP id b13so5374701vby.33 for ; Wed, 20 Feb 2013 14:31:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=smFm3O2qIU76uLEaSoHl4SAYT7+tXCiQ/JlxcA2V1rw=; b=aLX0RDng5J9Dmeka80nHnkyG9MlPgwu7DU4qWr+vQkHK/8NRH0pMVkw3OvBrFRDdU0 64Ret5I51AGw4iBeqYjHgWiIC+TfRXvAlPFEjC/pJo9VzsO7AM76L7WEKcs3bmbHXJ+q 0VuyP7/Jc6uX/eshY9FGY580QAKK1VTFhM+U+QHi4Zcq31ruYQCmIqaKoI5wjqq2ASp/ fkVOryXaZAHUUu0mneV/g/PY2YKhbQ6CebJCCEpykBpEiwuKVxaXuNnsnx/5ubRVvOpL 9O6rRRC/q1Xi51oIuzkNDDHx9erwVSghNTbqeifFPyKTf2S8nZqBqDvKpkmqDmOjpFsn cx1Q== MIME-Version: 1.0 X-Received: by 10.52.74.34 with SMTP id q2mr24651440vdv.76.1361399491059; Wed, 20 Feb 2013 14:31:31 -0800 (PST) Received: by 10.220.208.77 with HTTP; Wed, 20 Feb 2013 14:31:30 -0800 (PST) Received: by 10.220.208.77 with HTTP; Wed, 20 Feb 2013 14:31:30 -0800 (PST) In-Reply-To: References: Date: Wed, 20 Feb 2013 23:31:30 +0100 Message-ID: Subject: Re: cassandra vs. mongodb quick question(good additional info) From: Wojciech Meler To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=20cf3071c7fc0fdf0b04d62f85de X-Virus-Checked: Checked by ClamAV on apache.org --20cf3071c7fc0fdf0b04d62f85de Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable you have 86400 seconds a day so 42T could take less than 12 hours on 10Gb link 19 lut 2013 02:01, "Hiller, Dean" napisa=C5=82(a): > I thought about this more, and even with a 10Gbit network, it would take > 40 days to bring up a replacement node if mongodb did truly have a 42T / > node like I had heard. I wrote the below email to the person I heard thi= s > from going back to basics which really puts some perspective on it=E2=80= =A6.(and a > lot of people don't even have a 10Gbit network like we do) > > Nodes are hooked up by a 10G network at most right now where that is > 10gigabit. We are talking about 10Terabytes on disk per node recently. > > Google "10 gigabit in gigabytes" gives me 1.25 gigabytes/second (yes I > could have divided by 8 in my head but eh=E2=80=A6course when I saw the n= umber, I > went duh) > > So trying to transfer 10 Terabytes or 10,000 Gigabytes to a node that we > are bringing online to replace a dead node would take approximately 5 > days??? > > This means no one else is using the bandwidth too ;). 10,000Gigabytes * = 1 > second/1.25 * 1hr/60secs * 1 day / 24 hrs =3D 5.555555 days. This is mor= e > likely 11 days if we only use 50% of the network. > > So bringing a new node up to speed is more like 11 days once it is > crashed. I think this is the main reason the 1Terabyte exists to begin > with, right? > > From an ops perspective, this could sound like a nightmare scenario of > waiting 10 days=E2=80=A6..maybe it is livable though. Either way, I thou= ght it > would be good to share the numbers. ALSO, that is assuming the bus with > it's 10 disk can keep up with 10G???? Can it? What is the limit of > throughput on a bus / second on the computers we have as on wikipedia the= re > is a huge variance? > > What is the rate of the disks too (multiplied by 10 of course)? Will the= y > keep up with a 10G rate for bringing a new node online? > > This all comes into play even more so when you want to double the size of > your cluster of course as all nodes have to transfer half of what they ha= ve > to all the new nodes that come online(cassandra actually has a very data > center/rack aware topology to transfer data correctly to not use up all > bandwidth unecessarily=E2=80=A6I am not sure mongodb has that). Anyways,= just food > for thought. > > From: aaron morton >> > Reply-To: "user@cassandra.apache.org" < > user@cassandra.apache.org> > Date: Monday, February 18, 2013 1:39 PM > To: "user@cassandra.apache.org" < > user@cassandra.apache.org>, Vegard > Berget > > Subject: Re: cassandra vs. mongodb quick question > > My experience is repair of 300GB compressed data takes longer than 300GB > of uncompressed, but I cannot point to an exact number. Calculating the > differences is mostly CPU bound and works on the non compressed data. > > Streaming uses compression (after uncompressing the on disk data). > > So if you have 300GB of compressed data, take a look at how long repair > takes and see if you are comfortable with that. You may also want to test > replacing a node so you can get the procedure documented and understand h= ow > long it takes. > > The idea of the soft 300GB to 500GB limit cam about because of a number o= f > cases where people had 1 TB on a single node and they were surprised it > took days to repair or replace. If you know how long things may take, and > that fits in your operations then go with it. > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Developer > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 18/02/2013, at 10:08 PM, Vegard Berget post@fantasista.no>> wrote: > > > > Just out of curiosity : > > When using compression, does this affect this one way or another? Is 300= G > (compressed) SSTable size, or total size of data? > > .vegard, > > ----- Original Message ----- > From: > user@cassandra.apache.org > > To: > > > Cc: > > Sent: > Mon, 18 Feb 2013 08:41:25 +1300 > Subject: > Re: cassandra vs. mongodb quick question > > > If you have spinning disk and 1G networking and no virtual nodes, I would > still say 300G to 500G is a soft limit. > > If you are using virtual nodes, SSD, JBOD disk configuration or faster > networking you may go higher. > > The limiting factors are the time it take to repair, the time it takes to > replace a node, the memory considerations for 100's of millions of rows. = If > you the performance of those operations is acceptable to you, then go cra= zy. > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Developer > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 16/02/2013, at 9:05 AM, "Hiller, Dean" Dean.Hiller@nrel.gov>> wrote: > > So I found out mongodb varies their node size from 1T to 42T per node > depending on the profile. So if I was going to be writing a lot but rare= ly > changing rows, could I also use cassandra with a per node size of +20T or > is that not advisable? > > Thanks, > Dean > > > --20cf3071c7fc0fdf0b04d62f85de Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

you have 86400 seconds a day so 42T could take less than 12 = hours on 10Gb link

19 lut 2013 02:01, "Hiller, Dean" <= Dean.Hiller@nrel.gov> napisa= =C5=82(a):
I thought about this more, and even with a 10Gbit network, it would take 40= days to bring up a replacement node if mongodb did truly have a 42T / node= like I had heard. =C2=A0I wrote the below email to the person I heard this= from going back to basics which really puts some perspective on it=E2=80= =A6.(and a lot of people don't even have a 10Gbit network like we do)
Nodes are hooked up by a 10G network at most right now where that is 10giga= bit. =C2=A0We are talking about 10Terabytes on disk per node recently.

Google "10 gigabit in gigabytes" gives me 1.25 gigabytes/second = =C2=A0(yes I could have divided by 8 in my head but eh=E2=80=A6course when = I saw the number, I went duh)

So trying to transfer 10 Terabytes =C2=A0or 10,000 Gigabytes to a node that= we are bringing online to replace a dead node would take approximately 5 d= ays???

This means no one else is using the bandwidth too ;). =C2=A010,000Gigabytes= * 1 second/1.25 * 1hr/60secs * 1 day / 24 hrs =3D 5.555555 days. =C2=A0Thi= s is more likely 11 days if we only use 50% of the network.

So bringing a new node up to speed is more like 11 days once it is crashed.= =C2=A0I think this is the main reason the 1Terabyte exists to begin with, = right?

>From an ops perspective, this could sound like a nightmare scenario of wait= ing 10 days=E2=80=A6..maybe it is livable though. =C2=A0Either way, I thoug= ht it would be good to share the numbers. =C2=A0ALSO, that is assuming the = bus with it's 10 disk can keep up with 10G???? =C2=A0Can it? =C2=A0What= is the limit of throughput on a bus / second on the computers we have as o= n wikipedia there is a huge variance?

What is the rate of the disks too (multiplied by 10 of course)? =C2=A0Will = they keep up with a 10G rate for bringing a new node online?

This all comes into play even more so when you want to double the size of y= our cluster of course as all nodes have to transfer half of what they have = to all the new nodes that come online(cassandra actually has a very data ce= nter/rack aware topology to transfer data correctly to not use up all bandw= idth unecessarily=E2=80=A6I am not sure mongodb has that). =C2=A0Anyways, j= ust food for thought.

From: aaron morton <aaron@the= lastpickle.com<mailto:aar= on@thelastpickle.com>>
Reply-To: "user@cassandra= .apache.org<mailto:user= @cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, February 18, 2013 1:39 PM
To: "user@cassandra.apach= e.org<mailto:user@cassa= ndra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>, Vegard Berget <post@fantasista.no<mailto:post@fantasista.no>>
Subject: Re: cassandra vs. mongodb quick question

My experience is repair of 300GB compressed data takes longer than 300GB of= uncompressed, but I cannot point to an exact number. Calculating the diffe= rences is mostly CPU bound and works on the non compressed data.

Streaming uses compression (after uncompressing the on disk data).

So if you have 300GB of compressed data, take a look at how long repair tak= es and see if you are comfortable with that. You may also want to test repl= acing a node so you can get the procedure documented and understand how lon= g it takes.

The idea of the soft 300GB to 500GB limit cam about because of a number of = cases where people had 1 TB on a single node and they were surprised it too= k days to repair or replace. If you know how long things may take, and that= fits in your operations then go with it.

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thela= stpickle.com

On 18/02/2013, at 10:08 PM, Vegard Berget <post@fantasista.no<mailto:post@fantasista.no>> wrote:



Just out of curiosity :

When using compression, does this affect this one way or another? =C2=A0Is = 300G (compressed) SSTable size, or total size of data?

.vegard,

----- Original Message -----
From:
user@cassandra.apache.org&= lt;mailto:user@cassandra.apach= e.org>

To:
<user@cassandra.apache.org<= /a><mailto:user@cassandra.a= pache.org>>
Cc:

Sent:
Mon, 18 Feb 2013 08:41:25 +1300
Subject:
Re: cassandra vs. mongodb quick question


If you have spinning disk and 1G networking and no virtual nodes, I would s= till say 300G to 500G is a soft limit.

If you are using virtual nodes, SSD, JBOD disk configuration or faster netw= orking you may go higher.

The limiting factors are the time it take to repair, the time it takes to r= eplace a node, the memory considerations for 100's of millions of rows.= If you the performance of those operations is acceptable to you, then go c= razy.

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thela= stpickle.com<http://www.thelastpickle.com/>

On 16/02/2013, at 9:05 AM, "Hiller, Dean" <Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov>> wrote:

So I found out mongodb varies their node size from 1T to 42T per node depen= ding on the profile. =C2=A0So if I was going to be writing a lot but rarely= changing rows, could I also use cassandra with a per node size of +20T or = is that not advisable?

Thanks,
Dean


--20cf3071c7fc0fdf0b04d62f85de--