Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 526529515 for ; Sat, 14 Jul 2012 00:19:38 +0000 (UTC) Received: (qmail 28781 invoked by uid 500); 14 Jul 2012 00:19:36 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 28753 invoked by uid 500); 14 Jul 2012 00:19:36 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 28735 invoked by uid 99); 14 Jul 2012 00:19:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 Jul 2012 00:19:35 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [72.35.23.36] (HELO smtp-out2.electric.net) (72.35.23.36) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 Jul 2012 00:19:29 +0000 Received: from 1Spq4i-0000HZ-TO by bean.electric.net with emc1-ok (Exim 4.77) (envelope-from ) id 1Spq4i-0000Hj-Tl for user@cassandra.apache.org; Fri, 13 Jul 2012 17:19:08 -0700 Received: by emcmailer; Fri, 13 Jul 2012 17:19:08 -0700 Received: from [10.86.10.82] (helo=fuseout2b.electric.net) by bean.electric.net with esmtps (TLSv1:AES256-SHA:256) (Exim 4.77) (envelope-from ) id 1Spq4i-0000HZ-TO for user@cassandra.apache.org; Fri, 13 Jul 2012 17:19:08 -0700 Received: from mailanyone.net by fuseout2b.electric.net with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (MailAnyone extSMTP dbrosius@baybroadband.net) id 1Spq4h-0002iy-GU for user@cassandra.apache.org; Fri, 13 Jul 2012 17:19:08 -0700 Message-ID: <5000BACF.6000005@baybroadband.net> Date: Fri, 13 Jul 2012 20:18:23 -0400 From: Dave Brosius User-Agent: Mozilla/5.0 (X11; Linux i686; rv:12.0) Gecko/20120430 Thunderbird/12.0.1 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Re: SSTable format References: <7E77B33B-91B8-445F-80BC-AA4947A328BA@yahoo.com> In-Reply-To: <7E77B33B-91B8-445F-80BC-AA4947A328BA@yahoo.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Outbound-IP: 10.86.10.82 X-Env-From: dbrosius@baybroadband.net X-PolicySMART: 1272591 X-Virus-Status: Scanned by VirusSMART (c) X-Virus-Checked: Checked by ClamAV on apache.org On 07/13/2012 08:00 PM, Michael Theroux wrote: > Hello, > > I've been trying to understand in greater detail how SStables are stored, and how information is transferred between Cassandra nodes, especially when a new node is joining a cluster. > > Specifically, Is information stored to SStables ordered by rowkeys? Some of the articles I've read suggests this is the case (although it's a little vague if they actually mean that the columns are stored in order, not the rowkeys). However, if data is stored in rowkey order, how is this achieved, as sstables are immutable? > > Thanks for any insights, > -Mike It depends on what partitioner you use. You should be using the RandomPartitioner, and if so, the rows are sorted by the hash of the row key. there are partitioners that sort based on the raw key value but these partitioners shouldn't be used as they have problems due to uneven partitioning of data. As for how this is done, remember an sstable doesn't hold all the data for a column family. Not only does the data for a column family exist on multiple servers, there are usually multiple sstable files on disk that represent data from one column family on one machine. So at the time the sstable is written, the rows that are to be put in the sstable are sorted, and written in sorted order. In fact the same rowkey may be written in multiple sstables, one sstable having one set of columns for the key, the other sstable having other columns for the same key. On query for some row based on a key, cassandra is responsible for finding where the columns are found in which sstables (potentially several) and merging the results.