cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olivier Michallat (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7431) Hadoop integration does not perform reverse DNS lookup correctly on EC2
Date Wed, 26 Nov 2014 17:00:14 GMT


Olivier Michallat commented on CASSANDRA-7431:

Just wanted to mention that there is a third option coming soon: Netty 4.1 will ship with
a built-in DNS client, which also allows reverse lookups (I've tested with a nightly build).

In the driver, I'm using the JNDI approach for now, but will switch to Netty when we upgrade
to 4.1.

> Hadoop integration does not perform reverse DNS lookup correctly on EC2
> -----------------------------------------------------------------------
>                 Key: CASSANDRA-7431
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>         Attachments: 2.0-CASSANDRA-7431.txt
> The split assignment on AbstractColumnFamilyInputFormat:247 peforms a reverse DNS lookup
of Cassandra IPs in order to preserve locality in Hadoop (task trackers are identified by
> However, the reverse lookup of an EC2 IP does not yield the EC2 hostname of that endpoint
when running from an EC2 instance due to the use of InetAddress.getHostname().
> In order to show this, consider the following piece of code:
> {|borderStyle=solid}
> public class DnsResolver {
>     public static void main(String[] args) throws Exception {
>         InetAddress namenodePublicAddress = InetAddress.getByName(args[0]);
>         System.out.println("getHostAddress: " + namenodePublicAddress.getHostAddress());
>         System.out.println("getHostName: " + namenodePublicAddress.getHostName());
>     }
> }
> {code}
> When this code is run from my machine to perform reverse lookup of an EC2 IP, the output
> {code:none}
> ➜  java DnsResolver
> getHostAddress:
> getHostName:
> {code}
> When this code is executed from inside an EC2 machine, the output is:
> {code:none}
> ➜  java DnsResolver
> getHostAddress:
> getHostName:
> {code}
> However, when using linux tools such as "host" or "dig", the EC2 hostname is properly
resolved from the EC2 instance, so there's some problem with Java's InetAddress.getHostname()
and EC2.
> Two consequences of this bug during AbstractColumnFamilyInputFormat split definition
> 1) If the Hadoop cluster is configured to use EC2 public DNS, the locality will be lost,
because Hadoop will try to match the CFIF split location (public IP) with the task tracker
location (public DNS), so no matches will be found.
> 2) If the Cassandra nodes' broadcast_address is set to public IPs, all hadoop communication
will be done via the public IP, what will incurr additional transference charges. If the public
IP is mapped to the EC2 DNS during split definition, when the task is executed, ColumnFamilyRecordReader
will resolve the public DNS to the private IP of the instance, so there will be not additional
> A similar bug was filed in the WHIRR project: 

This message was sent by Atlassian JIRA

View raw message