incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: cassandra vs. mongodb quick question(good additional info)
Date Wed, 20 Feb 2013 21:22:18 GMT
Write once and compact is generally a bad fit for very large datasets.
It is like being able to jump 60 feet in the air, but your legs can
not withstand 10 feet drops.

http://wiki.apache.org/cassandra/LargeDataSetConsiderations



On Wed, Feb 20, 2013 at 3:33 PM, Bryan Talbot <btalbot@aeriagames.com> wrote:
> There seem to be some data structures in cassandra which scale with the
> number of rows stored and consume in-jvm memory without bound (other than
> number of rows).  Even with 1.2, I think that index samples are still kept
> in-jvm so you may need to tune index_interval.  Unfortunately that is a
> global value so it will affect all CF and not just the big ones that need it
> to be different.
>
> There may be other issues (like during compaction) but that one pops out.
> Prior to 1.2, bloom filters would be a big problem too.
>
> -Bryan
>
>
>
> On Wed, Feb 20, 2013 at 12:20 PM, Hiller, Dean <Dean.Hiller@nrel.gov> wrote:
>>
>> Heh, we just discovered that mistake a few minutes ago….thanks though.  I
>> am now wondering and may run a test cluster with a separate 6 nodes and test
>> how compaction is on very large data sets and such.  We have tons of
>> research data that sits there so I am wondering if 20T / node is now
>> feasible with cassandra(I mean if mongodb has a 42T which 10gen was telling
>> my colleague, I would think we can with cassandra).
>>
>> Is there any reasons I should know up front that 20T per node won't work.
>> We have 20 disks per node and this definitely has a different profile than
>> previous cassandra systems I have setup.  We don't need really any caching
>> as disk access is typically fine on reads.
>>
>> Thanks,
>> Dean
>>
>> From: Bryan Talbot <btalbot@aeriagames.com<mailto:btalbot@aeriagames.com>>
>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
>> <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>> Date: Wednesday, February 20, 2013 1:04 PM
>> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
>> <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>> Subject: Re: cassandra vs. mongodb quick question(good additional info)
>>
>> This calculation is incorrect btw.  10,000 GB transferred at 1.25 GB / sec
>> would complete in about 8,000 seconds which is just 2.2 hours and not 5.5
>> days.  The error is in the conversion (1hr/60secs) which is off by 2 orders
>> of magnitude since (1hr/3600secs) is the correct conversion.
>>
>> -Bryan
>>
>>
>> On Mon, Feb 18, 2013 at 5:00 PM, Hiller, Dean
>> <Dean.Hiller@nrel.gov<mailto:Dean.Hiller@nrel.gov>> wrote:
>> Google "10 gigabit in gigabytes" gives me 1.25 gigabytes/second  (yes I
>> could have divided by 8 in my head but eh…course when I saw the number, I
>> went duh)
>>
>> So trying to transfer 10 Terabytes  or 10,000 Gigabytes to a node that we
>> are bringing online to replace a dead node would take approximately 5
>> days???
>>
>> This means no one else is using the bandwidth too ;).  10,000Gigabytes * 1
>> second/1.25 * 1hr/60secs * 1 day / 24 hrs = 5.555555 days.  This is more
>> likely 11 days if we only use 50% of the network.
>>
>> So bringing a new node up to speed is more like 11 days once it is
>> crashed.  I think this is the main reason the 1Terabyte exists to begin
>> with, right?
>>
>
>

Mime
View raw message