incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paulo Motta <pauloricard...@gmail.com>
Subject Virtual node support for Hadoop workloads
Date Thu, 17 Oct 2013 20:49:32 GMT
Hello,

According to DSE3.1 documentation [1], "DataStax recommends using virtual
nodes only on data centers running purely Cassandra workloads. You should
disable virtual nodes on data centers running either Hadoop or Solr
workloads by setting num_tokens to 1.".

There was a thread in this mailing list earlier this year [2], where it was
suggested a workaround to the problem of having a minimum of one map task
per token (unfeasible with vnodes). This suggestion involved implementing a
new Hadoop InputSplitFormat that could combine many tokens from a single
node, thus reducing the overhead of having too many tasks per node.

Is there any JIRA ticket around this issue yet, or something being worked
on to support VNodes for Hadoop workloads, or the suggestion remains to
avoid VNodes for analytics workloads (hadoop, solr)?

Thanks,

-- 
Paulo

[1]
http://www.datastax.com/docs/datastax_enterprise3.1/deploy/configuring_replication
**
[2]
http://mail-archives.apache.org/mod_mbox/cassandra-user/201302.mbox/%3CCAJV_UYdqYmfStn5OetWrozQqbi+-yP3X-Ew9xtW=QY=2zGYDMA@mail.gmtokenail.com%3E<http://mail-archives.apache.org/mod_mbox/cassandra-user/201302.mbox/%3CCAJV_UYdqYmfStn5OetWrozQqbi+-yP3X-Ew9xtW=QY=2zGYDMA@mail.gmail.com%3E>

Mime
View raw message