cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Viktor Jevdokimov <>
Subject RE: Sorting keys for batch reads to minimize seeks
Date Fri, 18 Oct 2013 07:31:46 GMT
The only thing you may win - avoid unnecessary network hops if:
- request sorted keys (by token) from appropriate replica with ConsistencyLevel.ONE and "dynamic_snitch:
- nodes has the same load
- replica not doing GC, and GC pauses are much higher than internode communication.

For multiple keys request C* will do multiple single key reads, except for range scan requests,
where only starting key and batch size is used in request.

Consider multiple key request as a slow request by design, try to model your data for low
latency single key requests.

So, what latencies do you want to achieve?

Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-03163 Vilnius,

Disclaimer: The information contained in this message and attachments is intended solely for
the attention and use of the named addressee and may be confidential. If you are not the intended
recipient, you are reminded that the information remains the property of the sender. You must
not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this
message in error, please contact the sender immediately and irrevocably delete this message
and any copies.-----Original Message-----
From: Artur Kronenberg []
Sent: Thursday, October 17, 2013 7:40 PM
Subject: Sorting keys for batch reads to minimize seeks


I am looking to somehow increase read performance on cassandra. We are still playing with
configurations but I was thinking if there would be solutions in software that might help
us speed up our read performance.

E.g. one idea, not sure how sane that is, was to sort read-batches by row-keys before submitting
them to cassandra. The idea is that row-keys should be closer together on the physical disk
and therefor this may minimize the amount of random seeks we have to do when querying say
1000 entries from cassandra. Does that make any sense?

Is there anything else that we can do in software to improve performance? Like specific batch
sizes for reads? We are using the astyanax library to access cassandra.


View raw message