From user-return-17519-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Thu Jun 9 15:24:38 2011 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A32CD6DEB for ; Thu, 9 Jun 2011 15:24:38 +0000 (UTC) Received: (qmail 76938 invoked by uid 500); 9 Jun 2011 15:24:36 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 76913 invoked by uid 500); 9 Jun 2011 15:24:36 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 76904 invoked by uid 99); 9 Jun 2011 15:24:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Jun 2011 15:24:36 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [204.13.248.66] (HELO mho-01-ewr.mailhop.org) (204.13.248.66) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Jun 2011 15:24:28 +0000 Received: from 75-166-66-241.hlrn.qwest.net ([75.166.66.241] helo=[192.168.0.2]) by mho-01-ewr.mailhop.org with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.72) (envelope-from ) id 1QUh5b-0004ou-6N for user@cassandra.apache.org; Thu, 09 Jun 2011 15:24:07 +0000 X-Mail-Handler: MailHop Outbound by DynDNS X-Originating-IP: 75.166.66.241 X-Report-Abuse-To: abuse@dyndns.com (see http://www.dyndns.com/services/mailhop/outbound_abuse.html for abuse reporting information) X-MHO-User: U2FsdGVkX19fT31+eOe5+2Efx4ssAFPGLy8nfi/Qczk= Message-ID: <4DF0E590.60501@dude.podzone.net> Date: Thu, 09 Jun 2011 09:24:00 -0600 From: AJ User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Re: Ideas for Big Data Support References: <4DF082E6.7030700@dude.podzone.net> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 6/9/2011 8:40 AM, Edward Capriolo wrote: > > > > Some of these things are challenges, and a few are being worked on in > one way or another. > > 1) Dynamic snitch was implemented to determine slow acting nodes and > re-balance load. > > 2) You can budget bootstrap with rsync, as long as you know what data > to copy where. 0.7.X made the data moving process more efficient. Still, moving only 1 TB of data over a T-1 would take 61 days. Or you could ship it in a couple. > > 3) There are many cases where different partition strategies can > theoretically be better. The question is for the normal use case what > is the best? > > 4) Compressed SSTables is on the way. This will be nice because it can > help maximize disk caches. > > 5) Compaction's *are* a good thing. You can already do this by setting > compaction thresholds to 0. That is not great because smaller > compactions can run really fast and you want those to happen > regularly. Another way I take care of this is forcing major > compactions on my schedule. This makes it very unlikely that a larger > compaction will happen at random during peak time. 0.8.X has > multi-threaded compaction and a throttling limit so that looks promising. > > More nodes vs less nodes..+1 more nodes. This does not mean you need > to go very small, but the larger disk configurations are just more > painful. Unless you can get very/very/very fast disks. Even with a massive RAID-0? At some point, the disk I/O throughput should be pretty fast where it can compete with cache speeds perhaps?