incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kanwar Sangha <kan...@mavenir.com>
Subject RE: Read Perf
Date Tue, 26 Feb 2013 15:23:45 GMT
Thanks. For our case, the no of rows will more or less be the same. The only thing which changes
is the columns and they keep getting added. 

-----Original Message-----
From: Hiller, Dean [mailto:Dean.Hiller@nrel.gov] 
Sent: 26 February 2013 09:21
To: user@cassandra.apache.org
Subject: Re: Read Perf

To find stuff on disk, there is a bloomfilter for each file in memory.  On the docs, 1 billion
rows has 2Gig of RAM, so it really will have a huge dependency on your number of rows.  As
you get more rows, you may need to modify the bloomfilter false positive to use less RAM but
that means slower reads.  Ie. As you add more rows, you will have slower reads on a single
machine.

We hit the RAM limit on one machine with 1 billion rows so we are in the process of tweaking
the ratio of 0.000744(the default) to 0.1 to give us more time to solve.  Since we see no
I/o load on our machines(or rather extremely little), we plan on moving to leveled compaction
where 0.1 is the default in new releases and size tiered new default I think is 0.01.

Ie. If you store more data per row, this is not an issue as much but still something to consider.
 (Also, rows have a limit I think as well on data size but not sure what that is.  I know
the column limit on a row is in the millions, somewhere lower than 10 million).

Later,
Dean

From: Kanwar Sangha <kanwar@mavenir.com<mailto:kanwar@mavenir.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, February 25, 2013 8:31 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Read Perf

Hi - I am doing a performance run using modified YCSB client and was able to populate 8TB
on a node and then ran some read workloads. I am seeing an average TPS of 930 ops/sec for
random reads. There is no key cache/row cache. Question -

Will the read TPS degrade if the data size increases to say 20 TB , 50 TB, 100 TB ? If I understand
correctly, the read should remain constant irrespective of the data size since we eventually
have sorted SStables and binary search would be done on the index filter to find the row ?


Thanks,
Kanwar

Mime
View raw message