cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Haddad <...@jonhaddad.com>
Subject Re: How to monitor datastax driver compression performance?
Date Tue, 09 Apr 2019 14:44:34 GMT
tlp-stress has support for customizing payloads, but it's not
documented very well.  For a given data model (say the KeyValue one),
you can override what tlp-stress will send over.  By default it's
pretty small, a handful of bytes.

If you pass --field.keyvalue.value (the table name + the field name)
then the custom field generator you'd like to use.  For example,
--field.keyvalue.value='random(10000,11000)` will generate 10K random
characters.  You can also generate text from real words by using the
book(100,200) function (100-200 random works out of books) if you want
something that will compress better.

You can see a (poorly formatted) list of all the customizations you
can do by running `tlp-stress fields`

This is one the areas I haven't spent enough time on to share with the
world in a carefree manner, but it works.  If you're willing to
overlook the poor docs in the area I think it might meet your needs.

Regarding compression at the query level vs not, I think you should
look at the overhead first.  I'm betting you'll find it's
insignificant.  That said, you can always create two cluster objects
with two radically different settings if you find you need it.

On Tue, Apr 9, 2019 at 6:32 AM Gabriel Giussi <gabrielgiussi@gmail.com> wrote:
>
> tlp-stress allow us to define size of rows? Because I will see the benefit of compression
in terms of request rates only if the compression ratio is significant, i.e. requires less
network round trips.
> This could be done generating bigger partitions with parameters -n and -p, i.e. decreasing
the -p?
>
> Also, don't you think that driver should allow configuring compression per query? Because
one table with wide rows could benefit from compression while another one with less payload
could not.
>
> Thanks for your help Jon.
>
>
> El lun., 8 abr. 2019 a las 19:13, Jon Haddad (<jon@jonhaddad.com>) escribió:
>>
>> If it were me, I'd look at raw request rates (in terms of requests /
>> second as well as request latency), network throughput and then some
>> flame graphs of both the server and your application:
>> https://github.com/jvm-profiling-tools/async-profiler.
>>
>> I've created an issue in tlp-stress to add compression options for the
>> driver: https://github.com/thelastpickle/tlp-stress/issues/67.  If
>> you're interested in contributing the feature I think tlp-stress will
>> more or less solve the remainder of the problem for you (the load
>> part, not the os numbers).
>>
>> Jon
>>
>>
>>
>>
>> On Mon, Apr 8, 2019 at 7:26 AM Gabriel Giussi <gabrielgiussi@gmail.com> wrote:
>> >
>> > Hi, I'm trying to test if adding driver compression will bring me any benefit.
>> > I understand that the trade-off is less bandwidth but increased CPU usage in
both cassandra nodes (compression) and client nodes (decompression) but I want to know what
are the key metrics and how to monitor them to probe compression is giving good results?
>> > I guess I should look at latency percentiles reported by com.datastax.driver.core.Metrics
and CPU usage, but what about bandwith usage and compression ratio?
>> > Should I use tcpdump to capture packets length coming from cassandra nodes?
Something like tcpdump -n "src port 9042 and tcp[13] & 8 != 0" | sed -n "s/^.*length \(.*\).*$/\1/p"
would be enough?
>> >
>> > Thanks
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Mime
View raw message