i am just reading/writing 4k+/-1k of data to a single column in a single
column family. i do some writes of fresh data and some read/write of
existing data. i will end up in the 100 million row range, maintaining
about a 2 million row of "hot data". so i have small rows, but _lots_
of them.
so you are using row cache? what setting?
what i find is that the OS cache is plenty good enough. i have 48gb RAM
per node and try to give the OS as much as possible by setting "-Xms1G
-Xmx44G". the Xmx is large because of what i'd seen with cassandra
needing a lot of memory sometimes. and in fact, you don't want to use
too much JVM memory as GC will start to eat up your CPU time and cause
bottlenecks.
what i don't like is it appears that once the JVM "commits" RAM to its
process it never releases it. at least i haven't seen it release.
Tom Chen wrote:
> Can you give some details about the use case that you are using
> cassandra for? I am actually looking to store almost the data in the
> same manner, except with more of a variance in data 1k to 5k with
> about 20 million rows.
>
> I have been benchmarking cassandra on 5 verses 6, and v6 has
> significant speed improvements if I hit the cache (obviously memory
> access verses random disk.) Write performance in either version is
> pretty damn good.
>
>
> Tom
>
|