Return-Path: X-Original-To: apmail-cassandra-dev-archive@www.apache.org Delivered-To: apmail-cassandra-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 764599219 for ; Wed, 21 Mar 2012 19:24:29 +0000 (UTC) Received: (qmail 73020 invoked by uid 500); 21 Mar 2012 19:24:28 -0000 Delivered-To: apmail-cassandra-dev-archive@cassandra.apache.org Received: (qmail 72952 invoked by uid 500); 21 Mar 2012 19:24:28 -0000 Mailing-List: contact dev-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list dev@cassandra.apache.org Received: (qmail 72943 invoked by uid 99); 21 Mar 2012 19:24:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Mar 2012 19:24:28 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tom@acunu.com designates 209.85.220.172 as permitted sender) Received: from [209.85.220.172] (HELO mail-vx0-f172.google.com) (209.85.220.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Mar 2012 19:24:23 +0000 Received: by vcbfk13 with SMTP id fk13so1627107vcb.31 for ; Wed, 21 Mar 2012 12:24:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=acunu.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=1O/b8wZL8/S8rvVIMdTie6kFYsuyA8TxAh97zgPa2BQ=; b=GUu8yXI1WkSdBrg3n52mnvmsKcxaqBuKhG16AfZv3bHXiHw4mqrdXiH9zsNDaDVdgR jYDOpgKGZuGvzVTt/dUYQd8KM2NQuIkalLC0kbe1BeSR36OZmgNLWqPz6B/ev3YzYnk8 1lV7JueoTgPwmVzqWSyygvL9VG1LEgsGn5K98= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=1O/b8wZL8/S8rvVIMdTie6kFYsuyA8TxAh97zgPa2BQ=; b=WOGw1HBPp7yOFDrB9io5eMmd+l0YGSC8ASntjJTkMpiJ2ubCjIaCzYKEhQGm42Wix4 RuJkX9q2h5CglOAMQhcRTWFUAojmsePDTKpB/kehLqA/DhkBq81oROL9GDl72DCITcdm E22z1XOwr5pM74z4TruFm9ZZNYcFF80wwR1aNpy4QY3M0XKJ/W1+QzVbqEBShFv1JXR8 WQlLqizeq5C/wazG6Ka27kEWC0nKi433B8HOmwEMePMRkp6B7t+QEiWQSp4jXRBTpHrC 7ImjvmX+KlMQC1BIhe3uhmh8DjC7/t/ONxrpBZebJnnabxM8QHMWxd1HRqt1yQdHCmE3 bRpg== MIME-Version: 1.0 Received: by 10.220.153.145 with SMTP id k17mr2481241vcw.9.1332357841957; Wed, 21 Mar 2012 12:24:01 -0700 (PDT) Received: by 10.220.184.3 with HTTP; Wed, 21 Mar 2012 12:24:01 -0700 (PDT) In-Reply-To: References: Date: Wed, 21 Mar 2012 14:24:01 -0500 Message-ID: Subject: Re: RFC: Cassandra Virtual Nodes From: Tom Wilkie To: dev@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQmx0gmoNXNDvySwM6HE7TtONn3wLM8f/9Jeuup1ipJDBIXgE1IW6+DoT4QVR0p5y+rm+Hx0 X-Virus-Checked: Checked by ClamAV on apache.org Hi Edward > 1) No more raid 0. If a machine is responsible for 4 vnodes they > should correspond to for JBOD. So each vnode corresponds to a disk? I suppose we could have a separate data directory per disk, but I think this should be a separate, subsequent change. However, do note that making the vnode ~size of a disk (and only have 4-8 per machine) would make any non-hotswap rebuilds slower. To get the fast distributed rebuilds, you need to have at least as many vnodes per node as you do nodes in the cluster. And you would still need the distributed rebuilds to deal with disk failure. > 2) Vnodes should be able to be hot pluged. My normal cassandra chassis > would be a 2U with 6 drive bays. Imagine I have 10 nodes. Now if my > chassis dies I should be able to take the disks out and physically > plug them into another chassis. Then in cassandra I should be able to > run a command like. > nodetool attach '/mnt/disk6'. disk6 should contain all data an it's > vnode information. > > Now this would be awesome for upgrades/migrations/etc. You know, your not the first person I've spoke to who has asked for this! I do wonder whether it is optimising for the right thing though - in my experience, disks fail more often than machines. Thanks Tom