Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Mon, 22 Jun 2015 16:36:00 +0000 (UTC)
From: "T Jake Luciani (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12838179.1434464418000.137099.1434990960420@Atlassian.JIRA>
In-Reply-To: <JIRA.12838179.1434464418000@Atlassian.JIRA>
References: <JIRA.12838179.1434464418000@Atlassian.JIRA>
 <JIRA.12838179.1434464418384@arcas>
Subject: [jira] [Updated] (CASSANDRA-9603) Expose private listen_address in
 system.local
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


     [ https://issues.apache.org/jira/browse/CASSANDRA-9603?page=3Dcom.atla=
ssian.jira.plugin.system.issuetabpanels:all-tabpanel ]

T Jake Luciani updated CASSANDRA-9603:
--------------------------------------
    Fix Version/s:     (was: 2.1.7)
                   2.1.x

> Expose private listen_address in system.local
> ---------------------------------------------
>
>                 Key: CASSANDRA-9603
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9603
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Piotr Ko=C5=82aczkowski
>             Fix For: 2.1.x
>
>
> We had some hopes CASSANDRA-9436 would add it, yet it added rpc_address i=
nstead of both rpc_address *and* listen_address. We really need listen_addr=
ess here, because we need to get information on the private IP C* binds to.=
 Knowing this we could better match Spark nodes to C* nodes and process dat=
a locally in environments where rpc_address !=3D listen_address like EC2.=
=20
> See, Spark does not know rpc addresses nor it has a concept of broadcast =
address. It only knows the hostname / IP its workers bind to. In case of cl=
oud environments, these are private IPs. Now if we give Spark a set of C* n=
odes identified by rpc_addresses, Spark doesn't recognize them as belonging=
 to the same cluster. It treats them as "remote" nodes and has no idea wher=
e to send tasks optimally.=20
> Current situation (example):
> Spark worker nodes: [10.0.0.1, 10.0.0.2, 10.0.0.3]
> C* nodes: [10.0.0.1 / node1.blah.ec2.com, 10.0.0.2 / node2.blah.ec2.com, =
10.0.0.3 / node3.blah.ec2.com]
> What the application knows about the cluster: [node1.blah.ec2.com, node2.=
blah.ec2.com, node3.blah.ec2.com]
> What the application sends to Spark for execution:
>  Task1 - please execute on node1.blah.ec2.com
>  Task2 - please execute on node2.blah.ec2.com
>  Task3 - please execute on node3.blah.ec2.com
> How Spark understands it: "I have no idea what node1.blah.ec2.com is, let=
's assign Task1 it to a *random* node" :(
> Expected:
> Spark worker nodes: [10.0.0.1, 10.0.0.2, 10.0.0.3]
> C* nodes: [10.0.0.1 / node1.blah.ec2.com, 10.0.0.2 / node2.blah.ec2.com, =
10.0.0.3 / node3.blah.ec2.com]
> What the application knows about the cluster: [10.0.0.1 / node1.blah.ec2.=
com, 10.0.0.2 / node2.blah.ec2.com, 10.0.0.3 / node3.blah.ec2.com]
> What the application sends to Spark for execution:
>  Task1 - please execute on node1.blah.ec2.com or 10.0.0.1
>  Task2 - please execute on node2.blah.ec2.com or 10.0.0.2
>  Task3 - please execute on node3.blah.ec2.com or 10.0.0.3
> How Spark understands it: "10.0.0.1? - I have a worker on that node, lets=
 put Task 1 there"


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)