From user-return-5277-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Mon May 03 19:41:08 2010 Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 93738 invoked from network); 3 May 2010 19:41:08 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 May 2010 19:41:08 -0000 Received: (qmail 41547 invoked by uid 500); 3 May 2010 19:41:07 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 41513 invoked by uid 500); 3 May 2010 19:41:07 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 41503 invoked by uid 99); 3 May 2010 19:41:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 May 2010 19:41:07 +0000 X-ASF-Spam-Status: No, hits=4.6 required=10.0 tests=AWL,FS_REPLICA,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.217.214] (HELO mail-gx0-f214.google.com) (209.85.217.214) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 May 2010 19:40:59 +0000 Received: by gxk6 with SMTP id 6so195551gxk.5 for ; Mon, 03 May 2010 12:40:38 -0700 (PDT) Received: by 10.231.184.74 with SMTP id cj10mr1287844ibb.19.1272915629857; Mon, 03 May 2010 12:40:29 -0700 (PDT) Received: from mail-yx0-f194.google.com (mail-yx0-f194.google.com [209.85.210.194]) by mx.google.com with ESMTPS id c21sm1252224ibr.16.2010.05.03.12.40.28 (version=SSLv3 cipher=RC4-MD5); Mon, 03 May 2010 12:40:29 -0700 (PDT) Received: by yxe32 with SMTP id 32so1153033yxe.11 for ; Mon, 03 May 2010 12:40:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.150.65.4 with SMTP id n4mr9729363yba.63.1272915624897; Mon, 03 May 2010 12:40:24 -0700 (PDT) Received: by 10.150.201.5 with HTTP; Mon, 3 May 2010 12:40:24 -0700 (PDT) Date: Mon, 3 May 2010 14:40:24 -0500 Message-ID: Subject: replication with large rows From: Lee Parker To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=000e0cd5a1baa6cd400485b5c38a --000e0cd5a1baa6cd400485b5c38a Content-Type: text/plain; charset=ISO-8859-1 I have a CF on our cluster which has several rows with 200k+ columns of TimeUUID data. I have noticed recently that this CF is reaching my memtable thresholds (128M or 1.5 mill obj) far more frequently than I would expect (every 10 minutes or so). This CF is used as an index of items in another CF. So, all of the columns only have a single value, but there are lots of them. In the other CF, the rows all have about 10-15 columns, but there are millions of rows. I have reviewed our code several times and cannot see where we would be writing millions of columns to the index CF with this kind of frequency. Could this be caused by the replication of data between nodes? When one node has new data for a row, does it pass the entire row to the other nodes for replication or does it just pass the portion of the row that has changed? I have two nodes with a replication factor of 2. In the end, this is causing both of my servers to constantly work on compacting the files for the index CF. Lee Parker --000e0cd5a1baa6cd400485b5c38a Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I have= a CF on our cluster which has several rows with 200k+ columns of TimeUUID = data. =A0I have noticed recently that this CF is reaching my memtable thres= holds (128M or 1.5 mill obj) far more frequently than I would expect (every= 10 minutes or so). =A0This CF is used as an index of items in another CF. = =A0So, all of the columns only have a single value, but there are lots of t= hem. =A0In the other CF, the rows all have about 10-15 columns, but there a= re millions of rows. =A0I have reviewed our code several times and cannot s= ee where we would be writing millions of columns to the index CF with this = kind of frequency. =A0Could this be caused by the replication of data betwe= en nodes? =A0When one node has new data for a row, does it pass the entire = row to the other nodes for replication or does it just pass the portion of = the row that has changed? I have two nodes with a replication factor of 2. = =A0In the end, this is causing both of my servers to constantly work on com= pacting the files for the index CF.

Lee Parker

--000e0cd5a1baa6cd400485b5c38a--