cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carl Yeksigian (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9603) Expose private listen_address in system.local
Date Thu, 02 Jul 2015 18:10:04 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612309#comment-14612309
] 

Carl Yeksigian commented on CASSANDRA-9603:
-------------------------------------------

Just pushed an update to use {{FBUtilities.getLocalAddress()}}.

> Expose private listen_address in system.local
> ---------------------------------------------
>
>                 Key: CASSANDRA-9603
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9603
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Piotr Kołaczkowski
>            Assignee: Carl Yeksigian
>             Fix For: 2.0.x
>
>
> We had some hopes CASSANDRA-9436 would add it, yet it added rpc_address instead of both
rpc_address *and* listen_address. We really need listen_address here, because we need to get
information on the private IP C* binds to. Knowing this we could better match Spark nodes
to C* nodes and process data locally in environments where rpc_address != listen_address like
EC2. 
> See, Spark does not know rpc addresses nor it has a concept of broadcast address. It
only knows the hostname / IP its workers bind to. In case of cloud environments, these are
private IPs. Now if we give Spark a set of C* nodes identified by rpc_addresses, Spark doesn't
recognize them as belonging to the same cluster. It treats them as "remote" nodes and has
no idea where to send tasks optimally. 
> Current situation (example):
> Spark worker nodes: [10.0.0.1, 10.0.0.2, 10.0.0.3]
> C* nodes: [10.0.0.1 / node1.blah.ec2.com, 10.0.0.2 / node2.blah.ec2.com, 10.0.0.3 / node3.blah.ec2.com]
> What the application knows about the cluster: [node1.blah.ec2.com, node2.blah.ec2.com,
node3.blah.ec2.com]
> What the application sends to Spark for execution:
>  Task1 - please execute on node1.blah.ec2.com
>  Task2 - please execute on node2.blah.ec2.com
>  Task3 - please execute on node3.blah.ec2.com
> How Spark understands it: "I have no idea what node1.blah.ec2.com is, let's assign Task1
it to a *random* node" :(
> Expected:
> Spark worker nodes: [10.0.0.1, 10.0.0.2, 10.0.0.3]
> C* nodes: [10.0.0.1 / node1.blah.ec2.com, 10.0.0.2 / node2.blah.ec2.com, 10.0.0.3 / node3.blah.ec2.com]
> What the application knows about the cluster: [10.0.0.1 / node1.blah.ec2.com, 10.0.0.2
/ node2.blah.ec2.com, 10.0.0.3 / node3.blah.ec2.com]
> What the application sends to Spark for execution:
>  Task1 - please execute on node1.blah.ec2.com or 10.0.0.1
>  Task2 - please execute on node2.blah.ec2.com or 10.0.0.2
>  Task3 - please execute on node3.blah.ec2.com or 10.0.0.3
> How Spark understands it: "10.0.0.1? - I have a worker on that node, lets put Task 1
there"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message