Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 48446 invoked from network); 8 Oct 2010 17:37:12 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 8 Oct 2010 17:37:12 -0000 Received: (qmail 68248 invoked by uid 500); 8 Oct 2010 17:37:10 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 68223 invoked by uid 500); 8 Oct 2010 17:37:10 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 68215 invoked by uid 99); 8 Oct 2010 17:37:10 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Oct 2010 17:37:10 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jhorman@gmail.com designates 209.85.213.172 as permitted sender) Received: from [209.85.213.172] (HELO mail-yx0-f172.google.com) (209.85.213.172) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Oct 2010 17:37:02 +0000 Received: by yxp4 with SMTP id 4so123270yxp.31 for ; Fri, 08 Oct 2010 10:36:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=1oTidgGpuMbHHNe8IGz50UOISURVxlL9kRVRhG/uYEU=; b=lxcGlWUBRbGEFTGvWFYD9VOn9ADAu6DpXbGfKKtKsY0qAa+Az6Xwqa71AGRdBacliC 9KrKhdjwChC8PhMAN1hxxMU9MtkfOvdqVmbNxqc3qcdcL+DjtQL9pWXh7TaY7uMJjD9W Qfjo+BY3/rJwhWn3rC7FjPj9yh2GVCHvZZ46I= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=teimppQJvA3LX9eIIAU9doWam4u7u/39mkSV3AfXFB0CEq7IpZLmO2xF7sQNo46uqJ 214x32MEvCEH3/nhHrYiKaM4SGXOCI2ewAkwJC2U/ENLQeeY1/fnGkBpGf0QiQuQ0YVA /2Z7M01gWjrgIOC3+WaTCTcwAGqw+saNZr0ns= MIME-Version: 1.0 Received: by 10.90.31.12 with SMTP id e12mr1791429age.57.1286559401882; Fri, 08 Oct 2010 10:36:41 -0700 (PDT) Received: by 10.91.75.16 with HTTP; Fri, 8 Oct 2010 10:36:41 -0700 (PDT) Date: Fri, 8 Oct 2010 13:36:41 -0400 Message-ID: Subject: Cold boot performance problems From: Jason Horman To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001636283b4421b57304921e7403 X-Virus-Checked: Checked by ClamAV on apache.org --001636283b4421b57304921e7403 Content-Type: text/plain; charset=ISO-8859-1 We are experiencing very slow performance on Amazon EC2 after a cold boot. 10-20 tps. After the cache is primed things are much better, but it would be nice if users who aren't in cache didn't experience such slow performance. Before dumping a bunch of config I just had some general questions. - We are using uuid keys, 40m of them and the random partitioner. Typical access pattern is reading 200-300 keys in a single web request. Are uuid keys going to be painful b/c they are so random. Should we be using less random keys, maybe with a shard prefix (01-80), and make sure that our tokens group user data together on the cluster (via the order preserving partitioner) - Would the order preserving partitioner be a better option in the sense that it would group a single users data to a single set of machines (if we added a prefix to the uuid)? - Is there any benefit to doing sharding of our own via Keyspaces. 01-80 keyspaces to split up the data files. (we already have 80 mysql shards we are migrating from, so doing this wouldn't be terrible implementation wise) - Should a goal be to get the data/index files as small as possible. Is there a size at which they become problematic? (Amazon EC2/EBS fyi) - Via more servers - Via more cassandra instances on the same server - Via manual sharding by keyspace - Via manual sharding by columnfamily Thanks, -- -jason horman --001636283b4421b57304921e7403 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable We are experiencing very slow performance on Amazon EC2 after a cold boot. = 10-20 tps. After the cache is primed things are much better, but it would b= e nice if users who aren't in cache didn't experience such slow per= formance.

Before dumping a bunch of config I just had some general que= stions.
  • We are using uuid keys, 40m of them and the random parti= tioner. Typical access pattern is reading 200-300 keys in a single web requ= est. Are uuid keys going to be painful b/c they are so random.=A0Should we = be using less random keys, maybe with a shard prefix (01-80), and make sure= that our tokens group user data together on the cluster (via the order pre= serving partitioner)
  • Would the order preserving partitioner be a better option in the sense = that it would group a single users data to a single set of machines (if we = added a prefix to the uuid)?
  • Is there any benefit to doing sharding= of our own via Keyspaces. 01-80 keyspaces to split up the data files. (we = already have 80 mysql shards we are migrating from, so doing this wouldn= 9;t be terrible implementation wise)
  • Should a goal be to get the data/index files as small as possible. Is t= here a size at which they become problematic? (Amazon EC2/EBS fyi)
    • =
    • Via more servers
    • Via more cassandra instances on the same serve= r
    • Via manual sharding by keyspace
    • Via manual sharding by columnfa= mily
Thanks,

--
-jason horman
--001636283b4421b57304921e7403--