zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nauroth <cnaur...@hortonworks.com>
Subject Re: ZooKeeper Class Will Not Connect -- Fixed, but ZK Bug
Date Thu, 06 Aug 2015 17:44:05 GMT
I got an out-of-office auto-reply from Chris, so I went ahead and filed
the jira on his behalf.

https://issues.apache.org/jira/browse/ZOOKEEPER-2242


--Chris Nauroth




On 8/6/15, 10:10 AM, "Chris Nauroth" <cnauroth@hortonworks.com> wrote:

>Hi Chris,
>
>I'm glad to hear this worked out!
>
>Regarding the remaining bug related to OSGi, would you please file an
>Apache jira to track that?  Even better, if you're available to code a
>patch for it, the community would appreciate the contribution.  It sounds
>like you're a heavy user of OSGi, so you likely have a good test
>environment for validating a patch.
>
>The OSGi manifest headers are coded in the build.xml.  You can find them
>by searching for "Import-Package".
>
>Thanks again!
>
>--Chris Nauroth
>
>
>
>
>On 7/31/15, 5:18 PM, "Chris Barlock" <barlock@us.ibm.com> wrote:
>
>>I found the problem -- there were several missing package imports in the
>>MANIFEST.MF that I created for the OSGi bundle I made that wrapped all
>>the 
>>Kafka/ZooKeeper JAR files.  It took enabling the log4j logging for
>>
>>log4j.logger.org.apache.zookeeper=DEBUG
>>
>>in order to see the ClassNotFoundExceptions that are typical with this
>>problem.  It was odd to me because my application's logging has always
>>uncovered these problems, but...
>>
>>However, I did uncover a ZooKeeper bug.  After fixing this, I tried
>>"cleaning up" my OSGi bundle by pulling out all the JARs that are already
>>OSGi bundles on their own.  zookeeper-3.4.6 was one of these.  However,
>>it 
>>is missing a package import for
>>
>>org.ietf.jgss
>>
>>which is part of the Java SDK.  I added this to my ZK JAR and it resolved
>>the problem.
>>
>>Chris
>>
>>
>>
>>
>>From:   Chris Barlock/Raleigh/IBM@IBMUS
>>To:     user@zookeeper.apache.org
>>Date:   07/27/2015 05:09 PM
>>Subject:        Re: ZooKeeper Class Will Not Connect
>>
>>
>>
>>OK, I'm convinced that ZK is not broken.  I stripped my code down to a
>>simple stand-alone test case and it works just fine, even with the sleep
>>loop.  But, when I run it normally ZK 3.4.6, I don't see the ZK server
>>logging anything at all.  With 3.4.2, it is fine, as previously noted.
>>Full disclosure:  we are running the client code in the WebSphere Liberty
>>Profile app server and I have packaged the Kafka and ZK code into an OSGi
>>bundle. 
>>
>>I'd like to see what the client is doing, so I took the default
>>log4j.properties file that ships with ZK 3.4.6 and added this to the end:
>>
>>log4j.logger.org.apache.zookeeper=DEBUG, CONSOLE
>>
>>I see this in the console log that Liberty manages:
>>
>>[err] log4j:WARN No appenders could be found for logger
>>(org.apache.zookeeper.ZooKeeper).
>>
>>I'm not that familiar with log4j configuration.  Where have I gone wrong
>>here?
>>
>>Thanks!
>>
>>Chris
>>
>>
>>
>>
>>
>>From:   Chris Barlock/Raleigh/IBM@IBMUS
>>To:     user@zookeeper.apache.org
>>Date:   07/27/2015 03:13 PM
>>Subject:        Re: ZooKeeper Class Will Not Connect
>>
>>
>>
>>OK, ignore this...embarrassing!  I only got State once and tested it over
>>and over again...
>>
>>Chris
>>
>>
>>
>>
>>
>>From:   Chris Barlock/Raleigh/IBM@IBMUS
>>To:     user@zookeeper.apache.org
>>Date:   07/27/2015 02:25 PM
>>Subject:        Re: ZooKeeper Class Will Not Connect
>>
>>
>>
>>Chris:
>>
>>Your sample code works for me and I should be able to adapt it, but I
>>still think there is a bug.  I made the following change to your sample
>>to 
>>
>>
>>
>>test my method:
>>
>>    private Main(String hostPort) throws Exception {
>>        this.connectLatch = new CountDownLatch(1);
>>        waitTime = System.currentTimeMillis();
>>        this.zk = new ZooKeeper(hostPort, 3000, this);
>> 
>>        States state = zk.getState();
>>        while (state != States.CONNECTED) {
>>            System.out.println("State " + state);
>>            try {
>>                Thread.sleep(100);
>>            } catch (InterruptedException e) {
>>                System.out.println("Interrupted!");
>>            }
>>        }
>>    }
>>
>>This outputs:
>>
>>State CONNECTING
>>Received event: WatchedEvent state:SyncConnected type:None path:null
>>State CONNECTING
>>State CONNECTING
>>...
>>
>>and zk.getState never returns anything but CONNECTING.  It seems that
>>this 
>>
>>
>>
>>started with ZK 3.4.3 as 3.4.2 works for me, but 3.4.3, 4, 5 and 6 all
>>have this behavior in which the state is always CONNECTING.
>>
>>Chris
>>
>>
>>
>>
>>From:   Chris Nauroth <cnauroth@hortonworks.com>
>>To:     "user@zookeeper.apache.org" <user@zookeeper.apache.org>
>>Date:   07/24/2015 07:28 PM
>>Subject:        Re: ZooKeeper Class Will Not Connect
>>
>>
>>
>>ZooKeeper writes INFO-level logging about connection and session
>>establishment.  On the client side, these messages would come from the
>>ZooKeeper and ClientCnxn classes.  On the server side, these messages
>>would come from the ZooKeeperServer, NIOServerCnxn and NettyServerCnxn
>>classes.
>>
>>It's possible that you could get more detail by turning up to DEBUG level
>>logging by adding these lines to log4j.properties for the client and
>>server respectively:
>>
>>log4j.logger.org.apache.zookeeper=DEBUG
>>log4j.logger.org.apache.zookeeper.server=DEBUG
>>
>>
>>
>>--Chris Nauroth
>>
>>
>>
>>
>>On 7/24/15, 3:42 PM, "Chris Barlock" <barlock@us.ibm.com> wrote:
>>
>>>Chris:
>>>
>>>I have defined:
>>>
>>>    private static final int MAX_ZK_CONNECT_ATTEMPTS = 400;
>>> 
>>>    private static final long ZK_CONNECT_WAIT = 5; // Milliseconds
>>>
>>>so, two seconds, but I have also tried with ZK_CONNECT_WAIT = 500 (two
>>>hundred seconds).  getState always returned CONNECTING.  I can play with
>>>the async notification, but it really doesn't fit my application very
>>>well.  Is there any additional server or client tracing that can be
>>>enabled to get a better sense of what is going on?
>>>
>>>Chris
>>>
>>>IBM Tivoli Systems
>>>Research Triangle Park, NC
>>>(919) 224-2240
>>>Internet:  barlock@us.ibm.com
>>>
>>>
>>>
>>>From:   Chris Nauroth <cnauroth@hortonworks.com>
>>>To:     "user@zookeeper.apache.org" <user@zookeeper.apache.org>
>>>Date:   07/24/2015 06:19 PM
>>>Subject:        Re: ZooKeeper Class Will Not Connect
>>>
>>>
>>>
>>>Hello Chris,
>>>
>>>Thank you for the detailed repro steps describing which versions work
>>>and
>>>which versions don't work.  I tested your code sample against a 3.4.6
>>>build, and it worked consistently for me.  My only thought is that
>>perhaps
>>>the MAX_ZK_CONNECT_ATTEMPTS and ZK_CONNECT_WAIT constants are set such
>>>that the polling loop exits before the connection completes.  Perhaps a
>>>subtle timing difference in the newer versions is just now exposing
>>>this.
>>>
>>>The typical pattern for connection establishment is to rely on the
>>>ZooKeeper client's asynchronous event notification instead of a polling
>>>loop.  See below for a code sample that initiates the connection and
>>>then
>>>waits for the SyncConnected event.  The ZooKeeper programmer's guide and
>>>example program docs have a more detailed discussion of this.
>>>
>>>http://zookeeper.apache.org/doc/r3.4.6/zookeeperProgrammers.html
>>>
>>>
>>>http://zookeeper.apache.org/doc/r3.4.6/javaExample.html
>>>
>>>
>>>Could you please try running this against your 3.4.6 cluster?  I'd be
>>>curious to see if the connection completes for you.  This would also
>>>give
>>>you a sense for how long connection establishment is taking and whether
>>or
>>>not that's in line with your definitions of MAX_ZK_CONNECT_ATTEMPTS and
>>>ZK_CONNECT_WAIT.
>>>
>>>I hope this helps.
>>>
>>>
>>>
>>>class Main implements Watcher {
>>>
>>>    private final CountDownLatch connectLatch;
>>>    private final ZooKeeper zk;
>>>
>>>    public static void main(final String[] args) throws Exception {
>>>        String hostPort = args[0];
>>>        Main main = new Main(hostPort);
>>>        main.awaitConnection();
>>>        System.out.println("Exiting.");
>>>    }
>>>
>>>    private Main(String hostPort) throws Exception {
>>>        this.connectLatch = new CountDownLatch(1);
>>>        this.zk = new ZooKeeper(hostPort, 3000, this);
>>>    }
>>>
>>>    private void awaitConnection() throws InterruptedException {
>>>        this.connectLatch.await();
>>>        System.out.println("Connection has completed.");
>>>    }
>>>
>>>    @Override
>>>    public void process(WatchedEvent event) {
>>>        System.out.println("Received event: " + event);
>>>        if (event.getType() == Event.EventType.None) {
>>>            switch (event.getState()) {
>>>            case SyncConnected:
>>>                this.connectLatch.countDown();
>>>                break;
>>>            }
>>>        }
>>>    }
>>>}
>>>
>>>
>>>--Chris Nauroth
>>>
>>>
>>>
>>>
>>>On 7/24/15, 9:16 AM, "Chris Barlock" <barlock@us.ibm.com> wrote:
>>>
>>>>Ran some more tests.  My code works fine up through ZK 3.4.2, but then
>>>>fails with 3.4.3.  I did have to add the following to the
>>>>Import-Package
>>>>list in the ZK MANIFEST.MF:
>>>>
>>>>org.slf4 j,javax.security.auth.login,javax.security.sasl
>>>>
>>>>I could really use some help here, ZK folks!  Is my code incorrect with
>>>>newer versions of ZK, or is ZK broken?
>>>>
>>>>Chris
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>From:   Chris Barlock/Raleigh/IBM@IBMUS
>>>>To:     user@zookeeper.apache.org
>>>>Date:   07/23/2015 09:34 PM
>>>>Subject:        Re: ZooKeeper Class Will Not Connect
>>>>
>>>>
>>>>
>>>>I tried the 3.5.0 alpha build to see if it made any difference.  It did
>>>>not.
>>>>
>>>>But, I had to hack the MANIFEST.MF file in the JAR because the
>>>>"3.50-alpha" version fails tests in OSGi bundles that import the ZK
>>>>classes which have something like:
>>>>
>>>>version="[3.2,4)"
>>>>
>>>>I suggest that if you want to name the JAR "3.50-alpha" that all the
>>>>internals just use a version of 3.50.
>>>>
>>>>Chris
>>>>
>>>>IBM Tivoli Systems
>>>>Research Triangle Park, NC
>>>>(919) 224-2240
>>>>Internet:  barlock@us.ibm.com
>>>>
>>>>
>>>>
>>>>From:   Chris Barlock/Raleigh/IBM@IBMUS
>>>>To:     user@zookeeper.apache.org
>>>>Date:   07/23/2015 01:37 PM
>>>>Subject:        ZooKeeper Class Will Not Connect
>>>>
>>>>
>>>>
>>>>We are attempting to upgrade from Kafka 0.8.0, which includes ZK 3.3.4
>>to
>>>>Kafka 0.8.2.1 with ZK 3.4.6.  My code which attempts to connect to ZK
>>>>is
>>>>pretty straightforward:
>>>>
>>>>            try {
>>>>                ZooKeeper zk = new ZooKeeper(connectString,
>>>sessionTimeout
>>>>, this);
>>>>                int connectAttempts = 0;
>>>>
>>>>                while (!zk.getState().isConnected() && connectAttempts
>>>><
>>>>MAX_ZK_CONNECT_ATTEMPTS) {
>>>>                    try {
>>>>                        Thread.sleep(ZK_CONNECT_WAIT);
>>>>                    } catch (InterruptedException e) {
>>>>                        // Ignore
>>>>                    }
>>>>                    connectAttempts++;
>>>>                }
>>>>            } catch (IOException e) {
>>>>                trace.exception(CLASS_NAME, methodName, e);
>>>>            }
>>>>
>>>>With some additional tracing, States is always CONNECTING.  Has
>>something
>>>>changed with 3.4.6 about how I should connect to the server?  I can
>>>>connect just fine with the zookeeper-shell.sh that Kafka ships.  This
>>>>code 
>>>>
>>>>
>>>>always runs on the same system as ZK, so the connectString is always
>>>>"localhost:2181"
>>>>
>>>>Chris
>>>>
>>>
>>>
>>
>>
>>
>>
>>
>
>


Mime
View raw message