incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yiming Sun <yiming....@gmail.com>
Subject Re: need some clarification on recommended memory size
Date Thu, 17 May 2012 18:08:46 GMT
Hi Aaron,

Thank you for guiding us by breaking down the issue.  Please see my answers
embedded

> Is this a single client ?

Yes

> How many columns is it asking for ?

the client knows a list of all row keys, and it randomly picks 100, and
loops 100 times.  It first reads a metadata column to figure out how many
columns to read, and it then reads these columns

> What sort of query are you sending, slice or named columns?

currently all queries are slice queries.  so the first slice query reads
the metadata column (actually 2 metadata columns, one is for Number of
columns to read, the other for other information which is not needed for
the purpose of performance test, but I kept it in there to make it similar
to the real situation).    It then generates the column name array and
sends the second slice query.

The timing for the queries is completely isolated, and excludes the time
spent generating column name array etc.


>  From the client side how long is a single read taking ?

I am not 100% sure on what you are asking... are you saying how long it
takes for SliceQuery.execute()?  The average we are getting are between
50-70 ms, and nodetool report similar latency, differ by 5-10ms at top.


> What is the write workload like?  it sounds like it's write once read
many.

Indeed it is like a WORM environment. For the performance, we don't have
any writes.

> memory speed > network speed

yes.  right now, our data is only a sample about 250K rows, so the default
200,000 key cache hits above 90%.  But we soon will be hosting the real
deal with about 3M rows, so I am not sure our memory size will be able to
keep up with it.

In any case, Aaron, please let us know if you have any
suggestions/comments/insights.  Thanks!

-- Y.


On Thu, May 17, 2012 at 1:04 AM, aaron morton <aaron@thelastpickle.com>wrote:

> The read rate that I have been seeing is about 3MB/sec, and that is
> reading the raw bytes... using string serializer the rate is even lower,
> about 2.2MB/sec.
>
> Can we break this down a bit:
>
> Is this a single client ?
> How many columns is it asking for ?
> What sort of query are you sending, slice or named columns?
> From the client side how long is a single read taking ?
> What is the write workload like?  it sounds like it's write once read
> many.
>
> Use nodetool cfstats to see what the read latency is on a single node.
> (see http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/) Is there
> much difference between this and the latency from the client perspective ?
>
>
>
> Using JNA may help, but a blog article seems to say it only increase 13%,
> which is not very significant when the base performance is in single-digit
> MBs.
>
> There are other reasons to have JNA installed: more efficient snapshots
> and advising the OS when file operations should not be cached.
>
>  Our environment is virtualized, and the disks are actually SAN through
> fiber channels, so I don't know if that has impact on performance as well.
>
> memory speed > network speed
>
>   -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
>
>

Mime
View raw message