Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2BD881008B for ; Fri, 18 Oct 2013 07:32:23 +0000 (UTC) Received: (qmail 30567 invoked by uid 500); 18 Oct 2013 07:32:18 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 30527 invoked by uid 500); 18 Oct 2013 07:32:16 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 30516 invoked by uid 99); 18 Oct 2013 07:32:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Oct 2013 07:32:15 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of Viktor.Jevdokimov@adform.com designates 81.7.166.231 as permitted sender) Received: from [81.7.166.231] (HELO mail.adform.com) (81.7.166.231) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Oct 2013 07:32:09 +0000 Received: from dkexc001.adform.com ([10.0.8.3]) by mail.adform.com with XWall v3.49 ; Fri, 18 Oct 2013 10:31:49 +0300 Received: from DKEXC002.adform.com ([fe80::a869:727c:285c:fd3]) by DKEXC001.adform.com ([fe80::6d0c:ccd8:b690:688a%18]) with mapi id 14.02.0247.003; Fri, 18 Oct 2013 10:31:46 +0300 From: Viktor Jevdokimov To: "user@cassandra.apache.org" Subject: RE: Sorting keys for batch reads to minimize seeks Thread-Topic: Sorting keys for batch reads to minimize seeks Thread-Index: AQHOy1edvnyocdA8BkyQCvIIOxaqW5n6Dcfg Date: Fri, 18 Oct 2013 07:31:46 +0000 Message-ID: <2C85E14562B39345BCCAD90B8E7955C9314281@DKEXC002.adform.com> References: <526012E8.8020500@openmarket.com> In-Reply-To: <526012E8.8020500@openmarket.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.22.22.95] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-XWALL-BCKS: auto X-Virus-Checked: Checked by ClamAV on apache.org The only thing you may win - avoid unnecessary network hops if: - request sorted keys (by token) from appropriate replica with ConsistencyL= evel.ONE and "dynamic_snitch: false". - nodes has the same load - replica not doing GC, and GC pauses are much higher than internode commun= ication. For multiple keys request C* will do multiple single key reads, except for = range scan requests, where only starting key and batch size is used in requ= est. Consider multiple key request as a slow request by design, try to model you= r data for low latency single key requests. So, what latencies do you want to achieve? Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: Viktor.Jevdokimov@adform.com Phone: +370 5 212 3063 Fax: +370 5 261 0453 J. Jasinskio 16C, LT-03163 Vilnius, Lithuania Disclaimer: The information contained in this message and attachments is in= tended solely for the attention and use of the named addressee and may be c= onfidential. If you are not the intended recipient, you are reminded that t= he information remains the property of the sender. You must not use, disclo= se, distribute, copy, print or rely on this e-mail. If you have received th= is message in error, please contact the sender immediately and irrevocably = delete this message and any copies.-----Original Message----- From: Artur Kronenberg [mailto:artur.kronenberg@openmarket.com] Sent: Thursday, October 17, 2013 7:40 PM To: user@cassandra.apache.org Subject: Sorting keys for batch reads to minimize seeks Hi, I am looking to somehow increase read performance on cassandra. We are stil= l playing with configurations but I was thinking if there would be solution= s in software that might help us speed up our read performance. E.g. one idea, not sure how sane that is, was to sort read-batches by row-k= eys before submitting them to cassandra. The idea is that row-keys should b= e closer together on the physical disk and therefor this may minimize the a= mount of random seeks we have to do when querying say 1000 entries from cas= sandra. Does that make any sense? Is there anything else that we can do in software to improve performance? L= ike specific batch sizes for reads? We are using the astyanax library to ac= cess cassandra. Thanks!