cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeremy Hanna (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6091) Better Vnode support in hadoop/pig
Date Fri, 18 Oct 2013 10:14:43 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798976#comment-13798976
] 

Jeremy Hanna commented on CASSANDRA-6091:
-----------------------------------------

I think a factor that we've overlooked is data locality.  With smaller ranges and the same
input split size, there's a higher chance that the split will be outside of a single virtual
token range.  I have observed that in the job counters with vnodes enabled, only about a third
of the tasks are data local.  That would probably need some testing.  The user was doing some
tests with input split size.

In any case if this is borne out in testing, it is the bigger problem.

> Better Vnode support in hadoop/pig
> ----------------------------------
>
>                 Key: CASSANDRA-6091
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6091
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>            Reporter: Alex Liu
>            Assignee: Alex Liu
>
> CASSANDRA-6084 shows there are some issues during running hadoop/pig job if vnodes are
enable. Also the hadoop performance of vnode enabled nodes  are bad for there are so many
splits.
> The idea is to combine vnode splits into a big sudo splits so it work like vnode is disable
for hadoop/pig job



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message