Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D08FA9DA7 for ; Tue, 20 Mar 2012 17:38:17 +0000 (UTC) Received: (qmail 98279 invoked by uid 500); 20 Mar 2012 17:38:15 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 98242 invoked by uid 500); 20 Mar 2012 17:38:15 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 98234 invoked by uid 99); 20 Mar 2012 17:38:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Mar 2012 17:38:15 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a42.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Mar 2012 17:38:07 +0000 Received: from homiemail-a42.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a42.g.dreamhost.com (Postfix) with ESMTP id 9F40768C065 for ; Tue, 20 Mar 2012 10:37:45 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=QEC7WJb/oZ NJFF3xFPg8q4jnVAoB3U1IYylEP5NvDi4oup4ofzx4vr3f0bvmFYWInMp/VOzBv6 iRtUKZCqd7p19WnWD7PpaWjO38bvfsRkrO/HyeycKJUkSdhr9U2f3a5GWJ96d8b6 bNCmX2QC2hwDjgWlA3jKgjl2keGb7bmp0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=ZrEnZgtI2I5+UgVT Guk9IMvQbzI=; b=fsx5V9UdAxqnZk3xzVjkouPrqy/qLK0DNK1cR5eU6J4v/h0S IBLBjHrhYO7/V4az1wiJPks0qH/9S3qMHk0B9O4KhiNia8Ntgv9g0f2BPgCdiAGv O7z3gLj18lcPjSPIJBltgeOlrvVbE2O6zhAjP5ThteuH1dOBZZatbCDUWoY= Received: from [172.16.1.3] (125-236-193-159.adsl.xtra.co.nz [125.236.193.159]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a42.g.dreamhost.com (Postfix) with ESMTPSA id 16F3A68C05D for ; Tue, 20 Mar 2012 10:37:44 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1257) Content-Type: multipart/alternative; boundary="Apple-Mail=_671A7ED3-3BD1-4766-8AA9-FCF023F5F6BE" Subject: Re: single row key continues to grow, should I be concerned? Date: Wed, 21 Mar 2012 06:37:42 +1300 In-Reply-To: To: user@cassandra.apache.org References: Message-Id: X-Mailer: Apple Mail (2.1257) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_671A7ED3-3BD1-4766-8AA9-FCF023F5F6BE Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 > The reads are only fetching slices of 20 to 100 columns max at a time = from the row but if the key is planted on one node in the cluster I am = concerned about that node getting the brunt of traffic. What RF are you using, how many nodes are in the cluster, what CL do you = read at ? If you have lots of nodes that are in different racks the = NetworkTopologyStrategy will do a better job of distributing read load = than the SimpleStrategy. The DynamicSnitch can also result distribute = load, see cassandra yaml for it's configuration.=20 > I thought about breaking the column data into multiple different row = keys to help distribute throughout the cluster but its so darn handy = having all the columns in one key!! If you have a row that will continually grow it is a good idea to = partition it in some way. Large rows can slow things like compaction and = repair down. If you have something above 60MB it's starting to slow = things down. Can you partition by a date range such as month ? Large rows are also a little slower to query from http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ If most reads are only pulling 20 to 100 columns at a time are there two = workloads ? Is it possible store just these columns in a separate row ? = If you understand how big a row may get may be able to use the row cache = to improve performance. =20 Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 20/03/2012, at 2:05 PM, Blake Starkenburg wrote: > I have a row key which is now up to 125,000 columns (and anticipated = to grow), I know this is a far-cry from the 2-billion columns a single = row key can store in Cassandra but my concern is the amount of reads = that this specific row key may get compared to other row keys. This = particular row key houses column data associated with one of the more = popular areas of the site. The reads are only fetching slices of 20 to = 100 columns max at a time from the row but if the key is planted on one = node in the cluster I am concerned about that node getting the brunt of = traffic. >=20 > I thought about breaking the column data into multiple different row = keys to help distribute throughout the cluster but its so darn handy = having all the columns in one key!! >=20 > key_cache is enabled but row cache is disabled on the column family. >=20 > Should I be concerned going forward? Any particular advice on large = wide rows? >=20 > Thanks! --Apple-Mail=_671A7ED3-3BD1-4766-8AA9-FCF023F5F6BE Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 The reads are only fetching slices of 20 to = 100 columns max at a time from the row but if the key is planted on one = node in the cluster I am concerned about that node getting the brunt of = traffic.What RF are you using, how many nodes are in the = cluster, what CL do you read at ?

If you have lots of = nodes that are in different racks the NetworkTopologyStrategy will do a = better job of distributing read load than the SimpleStrategy. The = DynamicSnitch can also result distribute load, see cassandra yaml for = it's configuration. 

I thought about breaking the column data into multiple = different row keys to help distribute throughout the cluster but its so = darn handy having all the columns in one key!!
If you = have a row that will continually grow it is a good idea to partition it = in some way. Large rows can slow things like compaction and repair down. = If you have something above 60MB it's starting to slow things down. Can = you partition by a date range such as month = ?

Large rows are also a little slower to query = from

If most reads are only pulling 20 to 100 columns at a time are = there two workloads ? Is it possible store just these columns in a = separate row ? If you understand how big a row may get may be able to = use the row cache to improve performance. =  

Cheers


http://www.thelastpickle.com

On 20/03/2012, at 2:05 PM, Blake Starkenburg = wrote:

I have a row key which is now up to 125,000 columns (and = anticipated to grow), I know this is a far-cry from the 2-billion = columns a single row key can store in Cassandra but my concern is the = amount of reads that this specific row key may get compared to other row = keys. This particular row key houses column data associated with one of = the more popular areas of the site. The reads are only fetching slices = of 20 to 100 columns max at a time from the row but if the key is = planted on one node in the cluster I am concerned about that node = getting the brunt of traffic.

I thought about breaking the column data into multiple different row = keys to help distribute throughout the cluster but its so darn handy = having all the columns in one key!!

key_cache is enabled but row = cache is disabled on the column family.

Should I be concerned going forward? Any particular advice on large = wide rows?

Thanks!

= --Apple-Mail=_671A7ED3-3BD1-4766-8AA9-FCF023F5F6BE--