Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 482E510FA8 for ; Thu, 17 Oct 2013 20:57:40 +0000 (UTC) Received: (qmail 9112 invoked by uid 500); 17 Oct 2013 20:50:22 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 8917 invoked by uid 500); 17 Oct 2013 20:50:18 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 8862 invoked by uid 99); 17 Oct 2013 20:50:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Oct 2013 20:50:16 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of pauloricardomg@gmail.com designates 209.85.160.43 as permitted sender) Received: from [209.85.160.43] (HELO mail-pb0-f43.google.com) (209.85.160.43) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Oct 2013 20:50:12 +0000 Received: by mail-pb0-f43.google.com with SMTP id md4so2819869pbc.16 for ; Thu, 17 Oct 2013 13:49:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=TxUJu6aGLeakPuvkCQblkNtvfgMeXZczKMR7tMbvHE0=; b=Q+P4TmzEEtXoFpdm9v492wEtmSODKgzR8c2O3miiLlzHqg7BBeeXoiXw9zE/Blcw3n LGBidiCFbyJaeFRmLuonOS7ysS49wyqh+aMRld3owBXjlMCJ3WrJcE05/YWClBPI3q7W RGIlbpsPPr7PeOdYJDWtxB+oyPHMIPN5QfZu6pDmG60db/Nes8qPmDO8zJ36e5gYeBOt eV+uDF/LeJMc9lCQcw6e2w8ej5cuaJ8rgpyir8PUrUhqnLmrdpMqLyPfXIdTOgUrDsg7 +PWkEKn0MADvu4JQGej/z7SkcEEHKJCOJCKEDd/BycrUWlVelxjuI0CNmYaOjqaZO71B 119w== X-Received: by 10.66.216.234 with SMTP id ot10mr11204561pac.122.1382042992128; Thu, 17 Oct 2013 13:49:52 -0700 (PDT) MIME-Version: 1.0 Received: by 10.70.21.129 with HTTP; Thu, 17 Oct 2013 13:49:32 -0700 (PDT) From: Paulo Motta Date: Thu, 17 Oct 2013 17:49:32 -0300 Message-ID: Subject: Virtual node support for Hadoop workloads To: "user@cassandra.apache.org" Content-Type: multipart/alternative; boundary=047d7b5d660c9c39b804e8f5f50f X-Virus-Checked: Checked by ClamAV on apache.org --047d7b5d660c9c39b804e8f5f50f Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hello, According to DSE3.1 documentation [1], "DataStax recommends using virtual nodes only on data centers running purely Cassandra workloads. You should disable virtual nodes on data centers running either Hadoop or Solr workloads by setting num_tokens to 1.". There was a thread in this mailing list earlier this year [2], where it was suggested a workaround to the problem of having a minimum of one map task per token (unfeasible with vnodes). This suggestion involved implementing a new Hadoop InputSplitFormat that could combine many tokens from a single node, thus reducing the overhead of having too many tasks per node. Is there any JIRA ticket around this issue yet, or something being worked on to support VNodes for Hadoop workloads, or the suggestion remains to avoid VNodes for analytics workloads (hadoop, solr)? Thanks, --=20 Paulo [1] http://www.datastax.com/docs/datastax_enterprise3.1/deploy/configuring_repl= ication ** [2] http://mail-archives.apache.org/mod_mbox/cassandra-user/201302.mbox/%3CCAJV= _UYdqYmfStn5OetWrozQqbi+-yP3X-Ew9xtW=3DQY=3D2zGYDMA@mail.gmtokenail.com%3E<= http://mail-archives.apache.org/mod_mbox/cassandra-user/201302.mbox/%3CCAJV= _UYdqYmfStn5OetWrozQqbi+-yP3X-Ew9xtW=3DQY=3D2zGYDMA@mail.gmail.com%3E> --047d7b5d660c9c39b804e8f5f50f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hello,

According to DSE3.1 documentatio= n [1], "DataStax recommends using virtual nodes only on data centers r= unning purely Cassandra workloads. You should disable virtual nodes on data= centers running either Hadoop or Solr workloads by setting num_tokens to 1= .".

There was a thread in this mailing list earlier this ye= ar [2], where it was suggested a workaround to the problem of having a mini= mum of one map task per token (unfeasible with vnodes). This suggestion inv= olved implementing a new Hadoop InputSplitFormat that could combine many to= kens from a single node, thus reducing the overhead of having too many task= s per node.=A0

Is there any JIRA ticket around this issue yet, or some= thing being worked on to support VNodes for Hadoop workloads, or the sugges= tion remains to avoid VNodes for analytics workloads (hadoop, solr)?

--047d7b5d660c9c39b804e8f5f50f--