cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (Resolved) (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CASSANDRA-3839) BulkOutputFormat binds to wrong client address when client is Dual-stack and server is IPv6
Date Thu, 02 Feb 2012 23:20:53 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-3839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Brandon Williams resolved CASSANDRA-3839.
-----------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.1

Committed.
                
> BulkOutputFormat binds to wrong client address when client is Dual-stack and server is
IPv6
> -------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3839
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3839
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 1.1
>         Environment: Linux 2.6.32-5-amd64, Java 1.6.0_26-b03
>            Reporter: Erik Forsberg
>            Assignee: Brandon Williams
>              Labels: bulkloader
>             Fix For: 1.1
>
>         Attachments: 0005-Allow-using-any-interface-for-outgoing-connections.txt
>
>
> Trying to run a map/reduce job with BulkOutputFormat, in an environment where the Hadoop
nodes have Dual-stack (IPv4+IPv6) and the Cassandra servers are IPv6-only, it seems like the
TCP connection setup for streaming is explicitly setting the source address to the IPv4 address
of the Hadoop node, even though the destination address is IPv6. 
> I'm seeing connection attempts where source address is an IPv4-represented-in-IPv6 address
and destination is IPv6 of cassandra node. 
> In the log output from the Hadoop M/R job, I see:
> {noformat}
> 2012-02-01 16:49:19,909 WARN org.apache.cassandra.streaming.FileStreamTask: Failed attempt
1 to connect to /2001:4c28:a030:30:72f3:95ff:fe02:2936 to stream /var/lib/hadoop/mapred/local/taskTracker/forsberg/jobcache/job_201201120812_0204/attempt_201201120812_0204_m_000000_0/test/Histograms/test-Histograms-hc-1-Data.db
sections=1 progress=0/749048 - 0%. Retrying in 4000 ms. (java.net.ConnectException: Connection
timed out)
> {noformat}
> So, digging a bit down the code, I see that org.apache.cassandra.hadoop.BulkRecordWriter
successfully creates a Thrift connection to my Cassandra cluster, over IPv6. It successfully
retrieves tokenrange information.
> Later on, in org.apache.cassandra.streaming.FileStreamTask, it fails to connect to the
destination cassandra node. It seems to me that the problem is that org.apache.cassandra.net.OutboundTcpConnectionPool
is asking FBUtilities.getLocalAddress for the address to bind to, and getLocalAddress is returning
an IPv4 address when DatabaseDescriptor has not been initialized. And DatabaseDescriptor has
not been initialized, becase in BulkOutputFormat we're not reading cassandra.yaml. 
> I actually have a workaround for this which involves not applying patch that removes
need to read cassandra.yaml, then point to a cassandra.yaml generated specifically for the
purpose on each hadoop node, with listen_address set to the IPv6 address of the node. 
> This is with net.ipv6.bindv6only=0 in Linux sysctl - something you must have for Hadoop
to run. 
> Also tried -D mapred.child.java.opts="-Djava.net.preferIPv4Stack=false -Djava.net.preferIPv6Addresses=true",
i.e. setting properties to prefer IPv6 stack to M/R job, but didn't help.
> In this case, we would probably be better of not explicitly binding to any address -
the OS would do that for us. I understand binding explicitly makes sense when this code is
running inside Cassandra server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message