Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: local policy)
From: "Hiller, Dean" <Dean.Hiller@nrel.gov>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Tue, 26 Feb 2013 08:32:17 -0700
Subject: Re: Read Perf
Thread-Topic: Read Perf
Thread-Index: Ac4UNnUiu5l1Xs5HQuCmA79UaJum7g==
Message-ID: <CD52235E.217B4%Dean.Hiller@nrel.gov>
In-Reply-To: 
 <57C7C3CBDCB04F45A57AEC4CB21C0CCD1DB48425@mbx024-e1-nj-6.exch024.domain.local>
Accept-Language: en-US
Content-Language: en-US
user-agent: Microsoft-MacOutlook/14.2.5.121010
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

In that case, make sure you don't plan on going into the millions or test
the limit as I pretty sure it can't go above 10 million. (from previous
posts on this list).

Dean

On 2/26/13 8:23 AM, "Kanwar Sangha" <kanwar@mavenir.com> wrote:

>Thanks. For our case, the no of rows will more or less be the same. The
>only thing which changes is the columns and they keep getting added.
>
>-----Original Message-----
>From: Hiller, Dean [mailto:Dean.Hiller@nrel.gov]
>Sent: 26 February 2013 09:21
>To: user@cassandra.apache.org
>Subject: Re: Read Perf
>
>To find stuff on disk, there is a bloomfilter for each file in memory.
>On the docs, 1 billion rows has 2Gig of RAM, so it really will have a
>huge dependency on your number of rows.  As you get more rows, you may
>need to modify the bloomfilter false positive to use less RAM but that
>means slower reads.  Ie. As you add more rows, you will have slower reads
>on a single machine.
>
>We hit the RAM limit on one machine with 1 billion rows so we are in the
>process of tweaking the ratio of 0.000744(the default) to 0.1 to give us
>more time to solve.  Since we see no I/o load on our machines(or rather
>extremely little), we plan on moving to leveled compaction where 0.1 is
>the default in new releases and size tiered new default I think is 0.01.
>
>Ie. If you store more data per row, this is not an issue as much but
>still something to consider.  (Also, rows have a limit I think as well on
>data size but not sure what that is.  I know the column limit on a row is
>in the millions, somewhere lower than 10 million).
>
>Later,
>Dean
>
>From: Kanwar Sangha <kanwar@mavenir.com<mailto:kanwar@mavenir.com>>
>Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>Date: Monday, February 25, 2013 8:31 PM
>To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>Subject: Read Perf
>
>Hi - I am doing a performance run using modified YCSB client and was able
>to populate 8TB on a node and then ran some read workloads. I am seeing
>an average TPS of 930 ops/sec for random reads. There is no key cache/row
>cache. Question -
>
>Will the read TPS degrade if the data size increases to say 20 TB , 50
>TB, 100 TB ? If I understand correctly, the read should remain constant
>irrespective of the data size since we eventually have sorted SStables
>and binary search would be done on the index filter to find the row ?
>
>
>Thanks,
>Kanwar