incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Talbot <btal...@aeriagames.com>
Subject Re: cassandra vs. mongodb quick question(good additional info)
Date Wed, 20 Feb 2013 20:33:11 GMT
There seem to be some data structures in cassandra which scale with the
number of rows stored and consume in-jvm memory without bound (other than
number of rows).  Even with 1.2, I think that index samples are still kept
in-jvm so you may need to tune index_interval.  Unfortunately that is a
global value so it will affect all CF and not just the big ones that need
it to be different.

There may be other issues (like during compaction) but that one pops out.
 Prior to 1.2, bloom filters would be a big problem too.

-Bryan



On Wed, Feb 20, 2013 at 12:20 PM, Hiller, Dean <Dean.Hiller@nrel.gov> wrote:

> Heh, we just discovered that mistake a few minutes ago….thanks though.  I
> am now wondering and may run a test cluster with a separate 6 nodes and
> test how compaction is on very large data sets and such.  We have tons of
> research data that sits there so I am wondering if 20T / node is now
> feasible with cassandra(I mean if mongodb has a 42T which 10gen was telling
> my colleague, I would think we can with cassandra).
>
> Is there any reasons I should know up front that 20T per node won't work.
>  We have 20 disks per node and this definitely has a different profile than
> previous cassandra systems I have setup.  We don't need really any caching
> as disk access is typically fine on reads.
>
> Thanks,
> Dean
>
> From: Bryan Talbot <btalbot@aeriagames.com<mailto:btalbot@aeriagames.com>>
> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Date: Wednesday, February 20, 2013 1:04 PM
> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Subject: Re: cassandra vs. mongodb quick question(good additional info)
>
> This calculation is incorrect btw.  10,000 GB transferred at 1.25 GB / sec
> would complete in about 8,000 seconds which is just 2.2 hours and not 5.5
> days.  The error is in the conversion (1hr/60secs) which is off by 2 orders
> of magnitude since (1hr/3600secs) is the correct conversion.
>
> -Bryan
>
>
> On Mon, Feb 18, 2013 at 5:00 PM, Hiller, Dean <Dean.Hiller@nrel.gov
> <mailto:Dean.Hiller@nrel.gov>> wrote:
> Google "10 gigabit in gigabytes" gives me 1.25 gigabytes/second  (yes I
> could have divided by 8 in my head but eh…course when I saw the number, I
> went duh)
>
> So trying to transfer 10 Terabytes  or 10,000 Gigabytes to a node that we
> are bringing online to replace a dead node would take approximately 5
> days???
>
> This means no one else is using the bandwidth too ;).  10,000Gigabytes * 1
> second/1.25 * 1hr/60secs * 1 day / 24 hrs = 5.555555 days.  This is more
> likely 11 days if we only use 50% of the network.
>
> So bringing a new node up to speed is more like 11 days once it is
> crashed.  I think this is the main reason the 1Terabyte exists to begin
> with, right?
>
>

Mime
View raw message