incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kanwar Sangha <>
Subject RE: Read Perf
Date Tue, 26 Feb 2013 15:43:51 GMT
Yep. So the read will remain constant in this case ?

-----Original Message-----
From: Hiller, Dean [] 
Sent: 26 February 2013 09:32
Subject: Re: Read Perf

In that case, make sure you don't plan on going into the millions or test the limit as I pretty
sure it can't go above 10 million. (from previous posts on this list).


On 2/26/13 8:23 AM, "Kanwar Sangha" <> wrote:

>Thanks. For our case, the no of rows will more or less be the same. The 
>only thing which changes is the columns and they keep getting added.
>-----Original Message-----
>From: Hiller, Dean []
>Sent: 26 February 2013 09:21
>Subject: Re: Read Perf
>To find stuff on disk, there is a bloomfilter for each file in memory.
>On the docs, 1 billion rows has 2Gig of RAM, so it really will have a 
>huge dependency on your number of rows.  As you get more rows, you may 
>need to modify the bloomfilter false positive to use less RAM but that 
>means slower reads.  Ie. As you add more rows, you will have slower 
>reads on a single machine.
>We hit the RAM limit on one machine with 1 billion rows so we are in 
>the process of tweaking the ratio of 0.000744(the default) to 0.1 to 
>give us more time to solve.  Since we see no I/o load on our 
>machines(or rather extremely little), we plan on moving to leveled 
>compaction where 0.1 is the default in new releases and size tiered new default I think
is 0.01.
>Ie. If you store more data per row, this is not an issue as much but 
>still something to consider.  (Also, rows have a limit I think as well 
>on data size but not sure what that is.  I know the column limit on a 
>row is in the millions, somewhere lower than 10 million).
>From: Kanwar Sangha <<>>
>Reply-To: "<>"
>Date: Monday, February 25, 2013 8:31 PM
>To: "<>"
>Subject: Read Perf
>Hi - I am doing a performance run using modified YCSB client and was 
>able to populate 8TB on a node and then ran some read workloads. I am 
>seeing an average TPS of 930 ops/sec for random reads. There is no key 
>cache/row cache. Question -
>Will the read TPS degrade if the data size increases to say 20 TB , 50 
>TB, 100 TB ? If I understand correctly, the read should remain constant 
>irrespective of the data size since we eventually have sorted SStables 
>and binary search would be done on the index filter to find the row ?

View raw message