Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BDF94102B0 for ; Wed, 20 Nov 2013 21:56:36 +0000 (UTC) Received: (qmail 66982 invoked by uid 500); 20 Nov 2013 21:56:36 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 66939 invoked by uid 500); 20 Nov 2013 21:56:36 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 66884 invoked by uid 99); 20 Nov 2013 21:56:36 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Nov 2013 21:56:36 +0000 Date: Wed, 20 Nov 2013 21:56:35 +0000 (UTC) From: "Yuki Morishita (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-5503) Large Dataset with Secondary Index MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-5503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828183#comment-13828183 ] Yuki Morishita commented on CASSANDRA-5503: ------------------------------------------- I think streaming failure in this case is due to FD terminated streaming session as node was marked down, which is triggered by GC(note: we use 2x phi threshold). This FD behavior is unchanged in streaming 2.0. Secondary index building on receiving side is always done when all SSTables under a CF is pulled. And source nodes of streaming should not be stressed by streaming itself. > Large Dataset with Secondary Index > ---------------------------------- > > Key: CASSANDRA-5503 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5503 > Project: Cassandra > Issue Type: Bug > Reporter: Brooke Bryan > > We have a cluster with 1 CF, and 1 secondary index. Currently, there are around 12 billion keys across 10 nodes, and we need to grow the cluster to support new data. (This is only a small % of our total data atm) > The problem we are faced with, is when joining a new node, the system will often sit there joining, and then fail a stream stage, failing the process. This has been the result of another node running a compaction and building up its heap too high, or other issues. However, I think this problem could be massively reduced, and make the join process more stable, if the joining node pulled in all the data from the other nodes, and built up its secondary indexes after the other nodes have done everything they need to for the node to complete its join. -- This message was sent by Atlassian JIRA (v6.1#6144)