Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 59897 invoked from network); 23 Feb 2011 21:52:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 23 Feb 2011 21:52:13 -0000 Received: (qmail 28620 invoked by uid 500); 23 Feb 2011 21:52:11 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 28570 invoked by uid 500); 23 Feb 2011 21:52:10 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 28562 invoked by uid 500); 23 Feb 2011 21:52:10 -0000 Delivered-To: apmail-incubator-cassandra-user@incubator.apache.org Received: (qmail 28559 invoked by uid 99); 23 Feb 2011 21:52:10 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Feb 2011 21:52:10 +0000 X-ASF-Spam-Status: No, hits=2.0 required=5.0 tests=SPF_NEUTRAL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: 216.139.236.26 is neither permitted nor denied by domain of potekhin@bnl.gov) Received: from [216.139.236.26] (HELO sam.nabble.com) (216.139.236.26) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Feb 2011 21:52:04 +0000 Received: from jim.nabble.com ([192.168.236.80]) by sam.nabble.com with esmtp (Exim 4.69) (envelope-from ) id 1PsMcX-0003sb-NQ for cassandra-user@incubator.apache.org; Wed, 23 Feb 2011 13:51:41 -0800 Date: Wed, 23 Feb 2011 13:51:41 -0800 (PST) From: buddhasystem To: cassandra-user@incubator.apache.org Message-ID: <1298497901722-6057991.post@n2.nabble.com> Subject: Will the large datafile size affect the performance? MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org I know that theoretically it should not (apart from compaction issues), but maybe somebody has experience showing otherwise: My test cluster now has 250GB of data and will have 1.5TB in its reincarnation. If all these data is in a single CF -- will it cause read or write performance problems? Should I "shard" it? One advantage of splitting the data would be reducing the impact of compaction and repairs (or so I naively assume). TIA Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Will-the-large-datafile-size-affect-the-performance-tp6057991p6057991.html Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.