Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A34CB18A3A for ; Mon, 22 Jun 2015 16:36:05 +0000 (UTC) Received: (qmail 17453 invoked by uid 500); 22 Jun 2015 16:36:00 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 17412 invoked by uid 500); 22 Jun 2015 16:36:00 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 17400 invoked by uid 99); 22 Jun 2015 16:36:00 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Jun 2015 16:36:00 +0000 Date: Mon, 22 Jun 2015 16:36:00 +0000 (UTC) From: "T Jake Luciani (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-9603) Expose private listen_address in system.local MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-9603?page=3Dcom.atla= ssian.jira.plugin.system.issuetabpanels:all-tabpanel ] T Jake Luciani updated CASSANDRA-9603: -------------------------------------- Fix Version/s: (was: 2.1.7) 2.1.x > Expose private listen_address in system.local > --------------------------------------------- > > Key: CASSANDRA-9603 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9603 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Piotr Ko=C5=82aczkowski > Fix For: 2.1.x > > > We had some hopes CASSANDRA-9436 would add it, yet it added rpc_address i= nstead of both rpc_address *and* listen_address. We really need listen_addr= ess here, because we need to get information on the private IP C* binds to.= Knowing this we could better match Spark nodes to C* nodes and process dat= a locally in environments where rpc_address !=3D listen_address like EC2.= =20 > See, Spark does not know rpc addresses nor it has a concept of broadcast = address. It only knows the hostname / IP its workers bind to. In case of cl= oud environments, these are private IPs. Now if we give Spark a set of C* n= odes identified by rpc_addresses, Spark doesn't recognize them as belonging= to the same cluster. It treats them as "remote" nodes and has no idea wher= e to send tasks optimally.=20 > Current situation (example): > Spark worker nodes: [10.0.0.1, 10.0.0.2, 10.0.0.3] > C* nodes: [10.0.0.1 / node1.blah.ec2.com, 10.0.0.2 / node2.blah.ec2.com, = 10.0.0.3 / node3.blah.ec2.com] > What the application knows about the cluster: [node1.blah.ec2.com, node2.= blah.ec2.com, node3.blah.ec2.com] > What the application sends to Spark for execution: > Task1 - please execute on node1.blah.ec2.com > Task2 - please execute on node2.blah.ec2.com > Task3 - please execute on node3.blah.ec2.com > How Spark understands it: "I have no idea what node1.blah.ec2.com is, let= 's assign Task1 it to a *random* node" :( > Expected: > Spark worker nodes: [10.0.0.1, 10.0.0.2, 10.0.0.3] > C* nodes: [10.0.0.1 / node1.blah.ec2.com, 10.0.0.2 / node2.blah.ec2.com, = 10.0.0.3 / node3.blah.ec2.com] > What the application knows about the cluster: [10.0.0.1 / node1.blah.ec2.= com, 10.0.0.2 / node2.blah.ec2.com, 10.0.0.3 / node3.blah.ec2.com] > What the application sends to Spark for execution: > Task1 - please execute on node1.blah.ec2.com or 10.0.0.1 > Task2 - please execute on node2.blah.ec2.com or 10.0.0.2 > Task3 - please execute on node3.blah.ec2.com or 10.0.0.3 > How Spark understands it: "10.0.0.1? - I have a worker on that node, lets= put Task 1 there" -- This message was sent by Atlassian JIRA (v6.3.4#6332)