cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <>
Subject Re: Read Perf
Date Tue, 26 Feb 2013 15:49:13 GMT
Depends, are you

1. Reading the same size of data as the data set size grows?  (reading
more data does generally get slower like reading 1MB vs. 10MB)
2. Reading the same number of columns as the data set size grows?
3. Never reading in the entire row?

If the answer to all of the above is yes, yes, yes, then it should be fine
but always better to test.

ALSO, a big note, you MUST test doing a read repair as that will slow
things down BIG TIME.  We only have 130GB per node and in general
cassandra is made for 300G to 500G per node on 1T drive(typical config).
This is due to the maintenance so TEST your maintenance stuff before you
get burned there.

Just run nodetool upgradesstables and time it.  This definitely gets
slower as your data grows and gives you a good idea of how long operations
will take.  Of course, better yet, take a node completely out, wipe it and
put it back in and see how long it takes to get all the data back into by
running the read repair.  With 10T, you will have a lot of issues I


On 2/26/13 8:43 AM, "Kanwar Sangha" <> wrote:

>Yep. So the read will remain constant in this case ?
>-----Original Message-----
>From: Hiller, Dean []
>Sent: 26 February 2013 09:32
>Subject: Re: Read Perf
>In that case, make sure you don't plan on going into the millions or test
>the limit as I pretty sure it can't go above 10 million. (from previous
>posts on this list).
>On 2/26/13 8:23 AM, "Kanwar Sangha" <> wrote:
>>Thanks. For our case, the no of rows will more or less be the same. The
>>only thing which changes is the columns and they keep getting added.
>>-----Original Message-----
>>From: Hiller, Dean []
>>Sent: 26 February 2013 09:21
>>Subject: Re: Read Perf
>>To find stuff on disk, there is a bloomfilter for each file in memory.
>>On the docs, 1 billion rows has 2Gig of RAM, so it really will have a
>>huge dependency on your number of rows.  As you get more rows, you may
>>need to modify the bloomfilter false positive to use less RAM but that
>>means slower reads.  Ie. As you add more rows, you will have slower
>>reads on a single machine.
>>We hit the RAM limit on one machine with 1 billion rows so we are in
>>the process of tweaking the ratio of 0.000744(the default) to 0.1 to
>>give us more time to solve.  Since we see no I/o load on our
>>machines(or rather extremely little), we plan on moving to leveled
>>compaction where 0.1 is the default in new releases and size tiered new
>>default I think is 0.01.
>>Ie. If you store more data per row, this is not an issue as much but
>>still something to consider.  (Also, rows have a limit I think as well
>>on data size but not sure what that is.  I know the column limit on a
>>row is in the millions, somewhere lower than 10 million).
>>From: Kanwar Sangha <<>>
>>Reply-To: "<>"
>>Date: Monday, February 25, 2013 8:31 PM
>>To: "<>"
>>Subject: Read Perf
>>Hi - I am doing a performance run using modified YCSB client and was
>>able to populate 8TB on a node and then ran some read workloads. I am
>>seeing an average TPS of 930 ops/sec for random reads. There is no key
>>cache/row cache. Question -
>>Will the read TPS degrade if the data size increases to say 20 TB , 50
>>TB, 100 TB ? If I understand correctly, the read should remain constant
>>irrespective of the data size since we eventually have sorted SStables
>>and binary search would be done on the index filter to find the row ?

View raw message