cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6268) Poor performance of Hadoop if any DC is using VNodes
Date Tue, 29 Oct 2013 19:53:25 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808368#comment-13808368
] 

Jonathan Ellis commented on CASSANDRA-6268:
-------------------------------------------

We closed CASSANDRA-6124 in favor of adding LOCAL_ONE; I can't think of a case where you'd
want to span more than one DC but less than all.

> Poor performance of Hadoop if any DC is using VNodes
> ----------------------------------------------------
>
>                 Key: CASSANDRA-6268
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6268
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>            Reporter: Piotr Kołaczkowski
>            Assignee: Piotr Kołaczkowski
>         Attachments: 0001-DSP-2572-Adds-ability-to-set-target-DCs-where-a-Hado.patch
>
>
> Some customers are complaining about huge number of splits in Hadoop caused by VNodes.
Disabling vnodes only in Hadoop DC does not fix it, because splits are generated from the
results of describe_ring, which returns a huge number of ranges. 
> The proposed fix:
> - allows for specifying the DCs the Hadoop job should be run
> - merges the consecutive ranges before generating Hadoop splits, so we don't have artificial
range splitting caused by vnodes in the other DCs



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message