cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Kołaczkowski (JIRA) <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-6268) Poor performance of Hadoop if any DC is using VNodes
Date Tue, 29 Oct 2013 20:39:26 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808412#comment-13808412
] 

Piotr Kołaczkowski edited comment on CASSANDRA-6268 at 10/29/13 8:37 PM:
-------------------------------------------------------------------------

Right, but where to get the DC name from? I order to merge ranges, I need to know in which
DC. 

Or did you mean, not splitting / merging ranges, but generating a proper set of ranges right
from the start? That would require creating a version of describe_ring that gets the DC name
as a parameter (or a version that simply describes_ring of the current DC). I wanted my patch
to be as little invasive as possible, therefore I didn't consider this approach, but probably
that would be a cleaner solution...


was (Author: pkolaczk):
Right, but where to get the DC name from? I order to merge ranges, I need to know in which
DC. 

Or did you mean, not splitting / merging ranges, but generating a proper set of ranges right
from the start? That would require creating a version of describe_ring that gets the DC name
as a parameter (or a version that simply describes_ring of the current DC).

> Poor performance of Hadoop if any DC is using VNodes
> ----------------------------------------------------
>
>                 Key: CASSANDRA-6268
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6268
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>            Reporter: Piotr Kołaczkowski
>            Assignee: Piotr Kołaczkowski
>         Attachments: 0001-DSP-2572-Adds-ability-to-set-target-DCs-where-a-Hado.patch
>
>
> Some customers are complaining about huge number of splits in Hadoop caused by VNodes.
Disabling vnodes only in Hadoop DC does not fix it. Splits are generated from the results
of describe_ring, which returns a huge number of ranges anyways, and doesn't take into account
that there will be huge number of consecutive ranges residing on the nodes we'd like the M/R
job to be run.
> The proposed fix:
> 1. allows for specifying the DC(s) the Hadoop job should be run in (in DSE - defaults
to all Hadoop DCs)
> 2. merges consecutive ranges before generating Hadoop splits, so we don't have artificial
range splitting caused by vnodes in the other DCs
> For non-DSE users this feature is turned off by default and doesn't change the old behaviour.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message