Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 75C90CBC9 for ; Mon, 5 Jan 2015 15:54:35 +0000 (UTC) Received: (qmail 69272 invoked by uid 500); 5 Jan 2015 15:54:36 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 69236 invoked by uid 500); 5 Jan 2015 15:54:36 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 69224 invoked by uid 99); 5 Jan 2015 15:54:36 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Jan 2015 15:54:36 +0000 Date: Mon, 5 Jan 2015 15:54:36 +0000 (UTC) From: "T Jake Luciani (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-8494) incremental bootstrap MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264714#comment-14264714 ] T Jake Luciani commented on CASSANDRA-8494: ------------------------------------------- I'm not suggesting we change the ring early, just that we include pending ranges when we do read requests. But you are right, if node A proxies to joining node B how do we keep B from sending back to A. Perhaps we can have B broadcast gossip the ranges it's completed and A would only send to B when it sees there is data for that range. If B dies along the way everything was still pending so nothing bad happens. > incremental bootstrap > --------------------- > > Key: CASSANDRA-8494 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8494 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Jon Haddad > Assignee: Yuki Morishita > Priority: Minor > Labels: density > Fix For: 3.0 > > > Current bootstrapping involves (to my knowledge) picking tokens and streaming data before the node is available for requests. This can be problematic with "fat nodes", since it may require 20TB of data to be streamed over before the machine can be useful. This can result in a massive window of time before the machine can do anything useful. > As a potential approach to mitigate the huge window of time before a node is available, I suggest modifying the bootstrap process to only acquire a single initial token before being marked UP. This would likely be a configuration parameter "incremental_bootstrap" or something similar. > After the node is bootstrapped with this one token, it could go into UP state, and could then acquire additional tokens (one or a handful at a time), which would be streamed over while the node is active and serving requests. The benefit here is that with the default 256 tokens a node could become an active part of the cluster with less than 1% of it's final data streamed over. -- This message was sent by Atlassian JIRA (v6.3.4#6332)