Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of Viktor.Jevdokimov@adform.com
 designates 81.7.166.231 as permitted sender)
From: Viktor Jevdokimov <Viktor.Jevdokimov@adform.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: RE: Sorting keys for batch reads to minimize seeks
Thread-Topic: Sorting keys for batch reads to minimize seeks
Thread-Index: AQHOy1edvnyocdA8BkyQCvIIOxaqW5n6Dcfg
Date: Fri, 18 Oct 2013 07:31:46 +0000
Message-ID: <2C85E14562B39345BCCAD90B8E7955C9314281@DKEXC002.adform.com>
References: <526012E8.8020500@openmarket.com>
In-Reply-To: <526012E8.8020500@openmarket.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

The only thing you may win - avoid unnecessary network hops if:
- request sorted keys (by token) from appropriate replica with ConsistencyL=
evel.ONE and "dynamic_snitch: false".
- nodes has the same load
- replica not doing GC, and GC pauses are much higher than internode commun=
ication.

For multiple keys request C* will do multiple single key reads, except for =
range scan requests, where only starting key and batch size is used in requ=
est.

Consider multiple key request as a slow request by design, try to model you=
r data for low latency single key requests.

So, what latencies do you want to achieve?


Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: Viktor.Jevdokimov@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-03163 Vilnius,
Lithuania


Disclaimer: The information contained in this message and attachments is in=
tended solely for the attention and use of the named addressee and may be c=
onfidential. If you are not the intended recipient, you are reminded that t=
he information remains the property of the sender. You must not use, disclo=
se, distribute, copy, print or rely on this e-mail. If you have received th=
is message in error, please contact the sender immediately and irrevocably =
delete this message and any copies.-----Original Message-----
From: Artur Kronenberg [mailto:artur.kronenberg@openmarket.com]
Sent: Thursday, October 17, 2013 7:40 PM
To: user@cassandra.apache.org
Subject: Sorting keys for batch reads to minimize seeks

Hi,

I am looking to somehow increase read performance on cassandra. We are stil=
l playing with configurations but I was thinking if there would be solution=
s in software that might help us speed up our read performance.

E.g. one idea, not sure how sane that is, was to sort read-batches by row-k=
eys before submitting them to cassandra. The idea is that row-keys should b=
e closer together on the physical disk and therefor this may minimize the a=
mount of random seeks we have to do when querying say 1000 entries from cas=
sandra. Does that make any sense?

Is there anything else that we can do in software to improve performance? L=
ike specific batch sizes for reads? We are using the astyanax library to ac=
cess cassandra.

Thanks!