Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DAA73100D9 for ; Thu, 18 Jul 2013 06:02:22 +0000 (UTC) Received: (qmail 76045 invoked by uid 500); 18 Jul 2013 06:02:19 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 74241 invoked by uid 500); 18 Jul 2013 06:02:10 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 74020 invoked by uid 99); 18 Jul 2013 06:02:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Jul 2013 06:02:06 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of michalm@opera.com designates 209.85.215.181 as permitted sender) Received: from [209.85.215.181] (HELO mail-ea0-f181.google.com) (209.85.215.181) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Jul 2013 06:01:59 +0000 Received: by mail-ea0-f181.google.com with SMTP id a15so1481037eae.26 for ; Wed, 17 Jul 2013 23:01:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding :x-gm-message-state; bh=9B/Sh5vB/Fb/boT7g0cg26356XOrnNom3CanohgOwPw=; b=SQ3wy93oWCy3A2Q3Yy0ba8FP18QCWaT0dIld//twx7c+c6zeidt869kS7GSr1eI4Z8 cRLzw3o7zUJZ4gKZdZb2AtJC/DHG7n4Uv6n/CAXsdtjHEkcmMp+uYnQvYWEZ+DLp8uVl hsPRZNtiHW5kcuQodS7H+AipzT85mbB+wbYDxn7K6VNerMopIRofduaQiaZxH60N6oDR HkOj19STLMMf7qmxwNJmKhKpX/xaQznP5DqR6Ih5UGlZm/+t06UNzhu0TyGERpjwlCpN 16NGl3yJ/nIgiHTuzVSCTIoL2QHmmNLHN8vsc8d4ux8nmwGCbOU6D5nzp+nqj6t4AOh0 +0Jw== X-Received: by 10.14.220.66 with SMTP id n42mr9422557eep.67.1374127299470; Wed, 17 Jul 2013 23:01:39 -0700 (PDT) Received: from ?IPv6:2a01:1120:1:170:2e41:38ff:fe9b:cca4? ([2a01:1120:1:170:2e41:38ff:fe9b:cca4]) by mx.google.com with ESMTPSA id e44sm16416147eeh.11.2013.07.17.23.01.37 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 17 Jul 2013 23:01:38 -0700 (PDT) Message-ID: <51E784C1.7020100@opera.com> Date: Thu, 18 Jul 2013 08:01:37 +0200 From: =?UTF-8?B?TWljaGHFgiBNaWNoYWxza2k=?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130329 Thunderbird/17.0.5 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Re: is there a key to sstable index file? References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Gm-Message-State: ALoCoQm+yoHxhGEd9/xs9GqsQhGUT8Nhy3+K1y5poObj5Lfa7hbw5TRNoNmaAfCxXfyfTMWQFYkK X-Virus-Checked: Checked by ClamAV on apache.org SSTables are immutable - once they're written to disk, they cannot be changed. On read C* checks *all* SSTables [1], but to make it faster, it uses Bloom Filters, that can tell you if a row is *not* in a specific SSTable, so you don't have to read it at all. However, *if* you read it in case you have to, you don't read a whole SSTable - there's an in-memory Index Sample, that is used for binary search and returning only a (relatively) small block of real (full, on-disk) index, which you have to scan to find a place to retrieve the data from SSTable. Additionally you have a KeyCache to make reads faster - it points location of data in SSTable, so you don't have to touch Index Sample and Index at all. Once C* retrieves all data "parts" (including the Memtable part), timestamps are used to find the most recent version of data. [1] I believe that it's not true for all cases, as I saw a piece of code somewhere in the source, that starts checking SSTables in order from the newest to the oldest one (in terms of data timestamps - AFAIR SSTable MetaData stores info about smallest and largest timestamp in SSTable), and once the newest data for all columns are retrieved (assuming that schema is defined), retrieving data stops and older SSTables are not checked. If someone could confirm that it works this way and it's not something that I saw in my dream and now believe it's real, I'd be glad ;-) W dniu 17.07.2013 22:58, S Ahmed pisze: > Since SSTables are mutable, and they are ordered, does this mean that there > is a index of key ranges that each SS table holds, and the value could be 1 > more sstables that have to be scanned and then the latest one is chosen? > > e.g. Say I write a value "abc" to CF1. This gets stored in a sstable. > > Then I write "def" to CF1, this gets stored in another sstable eventually. > > How when I go to fetch the value, it has to scan 2 sstables and then figure > out which is the latest entry correct? > > So is there an index of key's to sstables, and there can be 1 or more > sstables per key? > > (This is assuming compaction hasn't occurred yet). >