cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-6268) Poor performance of Hadoop if any DC is using VNodes
Date Fri, 08 Nov 2013 23:06:17 GMT


Brandon Williams commented on CASSANDRA-6268:

bq. Isn't it confusing to have 1.2.x a higher version than 2.0.y?

What? 1.2 would be 39, and 2.0 would be 41.  1.2 is unlikely to ever get a new feature that
2.0 wouldn't, so that's fairly safe.

bq. So why not just 19.36.2 and 19.39.0

Because technically, api-wise, this isn't a bugfix.  We bent the rule slightly on CASSANDRA-6202
to avoid this conflict, but I like Aleksey's idea better.

> Poor performance of Hadoop if any DC is using VNodes
> ----------------------------------------------------
>                 Key: CASSANDRA-6268
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>            Reporter: Piotr Kołaczkowski
>            Assignee: Piotr Kołaczkowski
>         Attachments: 6268-src-1.2.txt, 6268-src-2.0.txt, 6268-thrift-1.2.txt, 6268-thrift-2.0.txt
> Some customers are complaining about huge number of splits in Hadoop caused by VNodes.
Disabling vnodes only in Hadoop DC does not fix it. Splits are generated from the results
of describe_ring, which returns a huge number of ranges anyways, and doesn't take into account
that there will be huge number of consecutive ranges residing on the nodes we'd like the M/R
job to be run.
> The proposed fix:
> 1. allows for specifying the DC(s) the Hadoop job should be run in (in DSE - defaults
to all Hadoop DCs)
> 2. merges consecutive ranges before generating Hadoop splits, so we don't have artificial
range splitting caused by vnodes in the other DCs
> For non-DSE users this feature is turned off by default and doesn't change the old behaviour.

This message was sent by Atlassian JIRA

View raw message