directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kiran Ayyagari (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (DIRSERVER-1894) Multi-Master replicated startup does not complete
Date Thu, 29 Aug 2013 14:22:52 GMT

     [ https://issues.apache.org/jira/browse/DIRSERVER-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kiran Ayyagari resolved DIRSERVER-1894.
---------------------------------------

       Resolution: Fixed
    Fix Version/s: 2.0.0-M16
         Assignee: Kiran Ayyagari

This was introduced when we were trying to fix the connectivity issue in replication tests.
Fixed here http://svn.apache.org/r1518658

                
> Multi-Master replicated startup does not complete
> -------------------------------------------------
>
>                 Key: DIRSERVER-1894
>                 URL: https://issues.apache.org/jira/browse/DIRSERVER-1894
>             Project: Directory ApacheDS
>          Issue Type: Bug
>          Components: ldap
>    Affects Versions: 2.0.0-M15
>            Reporter: Paul Bayliss
>            Assignee: Kiran Ayyagari
>            Priority: Blocker
>             Fix For: 2.0.0-M16
>
>         Attachments: config-1.ldif, config-2.ldif
>
>
> On startup of a directory instance configured as a replication consumer, the instance
is unable to bind to its local port until a connection can be made to the replication provider.
In a 2 node multi-master setup this has a chicken and egg effect in that neither node is able
to starts its LDAP port and the following errors are repeated in the logs indefinitely.
> Instance 1:
> [12:58:26] ERROR [org.apache.directory.server.CONSUMER_LOG] - Failed to connect to the
server localhost:11389, cause : Cannot connect on the server: Connection refused
> [12:58:26] ERROR [org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl]
- Failed to connect to the server localhost:11389, cause : Cannot connect on the server: Connection
refused
> Instance 2:
> [12:58:14] ERROR [org.apache.directory.server.CONSUMER_LOG] - Failed to connect to the
server localhost:10389, cause : Cannot connect on the server: Connection refused
> [12:58:14] ERROR [org.apache.directory.server.ldap.replication.consumer.ReplicationConsumerImpl]
- Failed to connect to the server localhost:10389, cause : Cannot connect on the server: Connection
refused
> netstat shows that the LDAP ports are not bound.
> > netstat -a | egrep "10389|11389"
> It is possible to trick the instances into starting up by starting instance 1 without
being a replication consumer, then starting instance 2. I then stop instance 1 change it to
be a consumer and restart it. Then both instances are running and netstat shows me the replication
connections and the listening LDAP ports. Replication now works in both directions.
> > netstat -a | egrep "10389|11389"
> tcp4       0      0  localhost.10389        localhost.51051        ESTABLISHED
> tcp4       0      0  localhost.51051        localhost.10389        ESTABLISHED
> tcp46      0      0  *.10389                *.*                    LISTEN     
> tcp4       0      0  localhost.11389        localhost.51050        ESTABLISHED
> tcp4       0      0  localhost.51050        localhost.11389        ESTABLISHED
> tcp46      0      0  *.11389                *.*                    LISTEN 
> I will attach the configuration file of the two instances that can be used to reproduce
this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message