cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Kołaczkowski (JIRA) <>
Subject [jira] [Created] (CASSANDRA-6268) Poor performance of Hadoop if any DC is using VNodes
Date Tue, 29 Oct 2013 19:49:25 GMT
Piotr Kołaczkowski created CASSANDRA-6268:

             Summary: Poor performance of Hadoop if any DC is using VNodes
                 Key: CASSANDRA-6268
             Project: Cassandra
          Issue Type: Improvement
          Components: Hadoop
            Reporter: Piotr Kołaczkowski
            Assignee: Piotr Kołaczkowski
         Attachments: 0001-DSP-2572-Adds-ability-to-set-target-DCs-where-a-Hado.patch

Some customers are complaining about huge number of splits in Hadoop caused by VNodes. Disabling
vnodes only in Hadoop DC does not fix it, because splits are generated from the results of
describe_ring, which returns a huge number of ranges. 

The proposed fix:
- allows for specifying the DCs the Hadoop job should be run
- merges the consecutive ranges before generating Hadoop splits, so we don't have artificial
range splitting caused by vnodes in the other DCs

This message was sent by Atlassian JIRA

View raw message