cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: need some clarification on recommended memory size
Date Sat, 19 May 2012 17:07:22 GMT
So, you're doing about 20 ops/s where each op consists of "read 2
metadata columns, then read ~250 columns of ~2K each."  Is that right?

Is your test client multithreaded?  Is it on a separate machine from
the Cassandra server?

What is your bottleneck?
http://spyced.blogspot.com/2010/01/linux-performance-basics.html

On Thu, May 17, 2012 at 1:08 PM, Yiming Sun <yiming.sun@gmail.com> wrote:
> Hi Aaron,
>
> Thank you for guiding us by breaking down the issue.  Please see my answers
> embedded
>
>> Is this a single client ?
>
> Yes
>
>> How many columns is it asking for ?
>
> the client knows a list of all row keys, and it randomly picks 100, and
> loops 100 times.  It first reads a metadata column to figure out how many
> columns to read, and it then reads these columns
>
>> What sort of query are you sending, slice or named columns?
>
> currently all queries are slice queries.  so the first slice query reads the
> metadata column (actually 2 metadata columns, one is for Number of columns
> to read, the other for other information which is not needed for the purpose
> of performance test, but I kept it in there to make it similar to the real
> situation).    It then generates the column name array and sends the second
> slice query.
>
> The timing for the queries is completely isolated, and excludes the time
> spent generating column name array etc.
>
>
>>  From the client side how long is a single read taking ?
>
> I am not 100% sure on what you are asking... are you saying how long it
> takes for SliceQuery.execute()?  The average we are getting are between
> 50-70 ms, and nodetool report similar latency, differ by 5-10ms at top.
>
>
>> What is the write workload like?  it sounds like it's write once read
>> many.
>
> Indeed it is like a WORM environment. For the performance, we don't have any
> writes.
>
>> memory speed > network speed
>
> yes.  right now, our data is only a sample about 250K rows, so the default
> 200,000 key cache hits above 90%.  But we soon will be hosting the real deal
> with about 3M rows, so I am not sure our memory size will be able to keep up
> with it.
>
> In any case, Aaron, please let us know if you have any
> suggestions/comments/insights.  Thanks!
>
> -- Y.
>
>
> On Thu, May 17, 2012 at 1:04 AM, aaron morton <aaron@thelastpickle.com>
> wrote:
>>
>> The read rate that I have been seeing is about 3MB/sec, and that is
>> reading the raw bytes... using string serializer the rate is even lower,
>> about 2.2MB/sec.
>>
>> Can we break this down a bit:
>>
>> Is this a single client ?
>> How many columns is it asking for ?
>> What sort of query are you sending, slice or named columns?
>> From the client side how long is a single read taking ?
>> What is the write workload like?  it sounds like it's write once read
>> many.
>>
>> Use nodetool cfstats to see what the read latency is on a single node.
>> (see http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/) Is there
>> much difference between this and the latency from the client perspective ?
>>
>>
>>
>> Using JNA may help, but a blog article seems to say it only increase 13%,
>> which is not very significant when the base performance is in single-digit
>> MBs.
>>
>> There are other reasons to have JNA installed: more efficient snapshots
>> and advising the OS when file operations should not be cached.
>>
>>  Our environment is virtualized, and the disks are actually SAN through
>> fiber channels, so I don't know if that has impact on performance as well.
>>
>> memory speed > network speed
>>
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Mime
View raw message