From user-return-25609-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Thu Apr 19 00:01:15 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D4DBD9671 for ; Thu, 19 Apr 2012 00:01:15 +0000 (UTC) Received: (qmail 92950 invoked by uid 500); 19 Apr 2012 00:01:13 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 92925 invoked by uid 500); 19 Apr 2012 00:01:13 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 92914 invoked by uid 99); 19 Apr 2012 00:01:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Apr 2012 00:01:13 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of hriundel88@gmail.com designates 74.125.82.172 as permitted sender) Received: from [74.125.82.172] (HELO mail-we0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Apr 2012 00:01:05 +0000 Received: by werb10 with SMTP id b10so6195854wer.31 for ; Wed, 18 Apr 2012 17:00:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=iniquewtZB0inmZSiY0cgDKrW1V5ahz+Gcz5/Jwm3Ik=; b=XygMViTjM+TNEs5hOgyH2dyG0lZ7kN+HwDE5cwp8ErSAkMN/eftJESaBkUm+c3rE+6 dMH17rjdZ2NR81m1SKcs1SSLno588o0oeCb56N4+QNJyOKnNKsIHhlf0mmRqI2WjNKsq AbUx87IBcfKbaKJ4zW4evy9MbzMlBJ3HLGWUYjiRP9Zbwfybb2H4BHD/t5gb3yu3u9lb IBy8laAz93pfRnOm0q8gbqwvpOVPdax8PAr5yO7TLzEbQjwo7bdcIKFf/N5wyEl5pY5l UMDluqlw+ZqP4sMicLjsgf145UzBsyXfcG9H0NNplkSniOoq6Qjrk14hajtnPjOkZl8/ OaIg== MIME-Version: 1.0 Received: by 10.180.104.137 with SMTP id ge9mr242987wib.20.1334793645328; Wed, 18 Apr 2012 17:00:45 -0700 (PDT) Received: by 10.180.14.198 with HTTP; Wed, 18 Apr 2012 17:00:45 -0700 (PDT) Date: Wed, 18 Apr 2012 17:00:45 -0700 Message-ID: Subject: Cassandra read optimization From: Dan Feldman To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=f46d041826381408a204bdfcddd0 --f46d041826381408a204bdfcddd0 Content-Type: text/plain; charset=ISO-8859-1 Hi all, I'm trying to optimize moving data from Cassandra to HDFS using either Ruby or Python client. Right now, I'm playing around on my staging server, an 8 GB single node machine. My data in Cassandra (1.0.8) consist of 2 rows (for now) with ~150k super columns each (I know, I know - super columns are bad). Every super column has ~25 columns totaling ~800 bytes per super column. I should also mention that currently the database is static - there are no writes/updates, only reads. Anyways, in my python/ruby scripts, I'm taking slices of 5000 supercolumns long from a single row. It takes 13 seconds with ruby and 8 seconds with pycassa to get a single slice. Or, in other words, it's currently reading at speeds of less than 500 kB per second. The speed seems to be linear with the length of a slice (i.e. 6 seconds for 2500 scs for ruby). If I run nodetool cfstats while my script is running, it tells me that my read latency on the column family is ~300ms. I assume that this is not normal and thus was wondering what parameters I could tweak to improve the performance. Thanks, Dan F. --f46d041826381408a204bdfcddd0 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi all,

I'm trying to optimize moving data from Cassandra to HDFS using either= =20 Ruby or Python client. Right now, I'm playing around on my staging=20 server, an 8 GB single node machine. My data in Cassandra (1.0.8)=20 consist of 2 rows (for now) with ~150k super columns each (I know, I=20 know - super columns are bad). Every super column has ~25 columns=20 totaling ~800 bytes per super column.

I should also mention that currently the database is static - there are no = writes/updates, only reads.

Anyways, in my python/ruby scripts, I'm taking slices of 5000=20 supercolumns long from a single row.=A0 It takes 13 seconds with ruby and 8 seconds with pycassa to get a single slice. Or, in other words, it's= =20 currently reading at speeds of less than 500 kB per second. The speed=20 seems to be linear with the length of a slice (i.e. 6 seconds for 2500=20 scs for ruby). If I run nodetool cfstats while my script is running, it=20 tells me that my read latency on the column family is ~300ms.

I assume that this is not normal and thus was wondering what parameters I c= ould tweak to improve the performance.

Thanks,
Dan F. --f46d041826381408a204bdfcddd0--