Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 557DE1065C for ; Fri, 18 Oct 2013 10:14:56 +0000 (UTC) Received: (qmail 11111 invoked by uid 500); 18 Oct 2013 10:14:50 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 10985 invoked by uid 500); 18 Oct 2013 10:14:46 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 10928 invoked by uid 99); 18 Oct 2013 10:14:43 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Oct 2013 10:14:43 +0000 Date: Fri, 18 Oct 2013 10:14:43 +0000 (UTC) From: "Jeremy Hanna (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-6091) Better Vnode support in hadoop/pig MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798976#comment-13798976 ] Jeremy Hanna commented on CASSANDRA-6091: ----------------------------------------- I think a factor that we've overlooked is data locality. With smaller ranges and the same input split size, there's a higher chance that the split will be outside of a single virtual token range. I have observed that in the job counters with vnodes enabled, only about a third of the tasks are data local. That would probably need some testing. The user was doing some tests with input split size. In any case if this is borne out in testing, it is the bigger problem. > Better Vnode support in hadoop/pig > ---------------------------------- > > Key: CASSANDRA-6091 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6091 > Project: Cassandra > Issue Type: Bug > Components: Hadoop > Reporter: Alex Liu > Assignee: Alex Liu > > CASSANDRA-6084 shows there are some issues during running hadoop/pig job if vnodes are enable. Also the hadoop performance of vnode enabled nodes are bad for there are so many splits. > The idea is to combine vnode splits into a big sudo splits so it work like vnode is disable for hadoop/pig job -- This message was sent by Atlassian JIRA (v6.1#6144)