Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 988E710995 for ; Wed, 11 Feb 2015 22:18:13 +0000 (UTC) Received: (qmail 17446 invoked by uid 500); 11 Feb 2015 22:18:13 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 17414 invoked by uid 500); 11 Feb 2015 22:18:13 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 17399 invoked by uid 99); 11 Feb 2015 22:18:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Feb 2015 22:18:13 +0000 Date: Wed, 11 Feb 2015 22:18:13 +0000 (UTC) From: "mck (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-6091) Better Vnode support in hadoop/pig MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14317090#comment-14317090 ] mck commented on CASSANDRA-6091: -------------------------------- The approach in the patch is to do allow multiple token ranges per split. We do with our custom input formats, and it is (very) effective in that it means splitSize is honoured. Handling multiple token ranges per split requires for example the code change found in CqlRecordReader whereby the reader must iterate over both rows and tokenRanges. The grouping of token rages by common location sets, so that splits again honour the splitSize, happens in AbstractColumnFamilyInputForma.collectSplits(..) Token ranges do not need to be adjacent. Everything in this patch is done client-side. > Better Vnode support in hadoop/pig > ---------------------------------- > > Key: CASSANDRA-6091 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6091 > Project: Cassandra > Issue Type: Bug > Components: Hadoop > Reporter: Alex Liu > Assignee: Alex Liu > > CASSANDRA-6084 shows there are some issues during running hadoop/pig job if vnodes are enable. Also the hadoop performance of vnode enabled nodes are bad for there are so many splits. > The idea is to combine vnode splits into a big sudo splits so it work like vnode is disable for hadoop/pig job -- This message was sent by Atlassian JIRA (v6.3.4#6332)