hadoop-zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "bryan thompson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is "read error"
Date Wed, 06 May 2009 12:52:30 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706402#action_12706402
] 

bryan thompson commented on ZOOKEEPER-344:
------------------------------------------

Update: 

This issue is clearly linked to heavy utilization or swapping on the clients.  I find that
if I keep the clients from swapping that this error
materializes relatively infrequently, and when it does materialize it is linked to a sudden
increase in load.  For example, the concurrent
start of 100 clients on 14 machines will sometimes trigger this issue.   I believe that the
issue can be closed at this point with the note
that swapping will cause expired connections.  I also observe similar problems with jini /
river, including cases where DGC (distributed
garbage collection) appears to fail.  All in all, it is my sense that Java processes must
avoid swapping if they want to have not just timely
but also reliable behavior.

Thanks,

-bryan


> doIO in NioServerCnxn: Exception causing close of session : cause is "read error"
> ---------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-344
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: java client, server
>    Affects Versions: 3.1.0
>         Environment: jdk1.6.0_07
> Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 x86_64 x86_64
x86_64 GNU/Linux
>            Reporter: bryan thompson
>             Fix For: 3.2.0
>
>
> I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I see a lot
of expired sessions.  I am using a 16 node cluster which is all on the same local network.
 There is a single zookeeper instance (these are benchmarking runs).
> The problem appears to be correlated with either run time or system load.\
> Personally I think that it is system load because I have session session expired events
under a Windows platform running zookeeper and the application (i.e., everthing is local)
when the application load suddenly spikes.  To me this suggests that the client is not able
to renew (ping) the zookeeper service in a timely manner and is expired.  But the log messages
below with the "read error" suggest that maybe there is something else going on?
> Zookeeper Configuration
> #Wed Mar 18 12:41:05 GMT-05:00 2009
> clientPort=2181
> dataDir=/var/bigdata/benchmark/zookeeper/1
> syncLimit=2
> dataLogDir=/var/bigdata/benchmark/zookeeper/1
> tickTime=2000
> Some representative log messages are below.
> Client side messages (from our app)
> ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400)
2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired
: zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
> ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400)
2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired
: zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
> Server side messages:
>  WARN [NIOServerCxn.Factory:2181] org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417)
2009-03-18 13:06:57,252 - Exception causing close of session 0x1201aac14300022 due to java.io.IOException:
Read error
>  WARN [NIOServerCxn.Factory:2181] org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417)
2009-03-18 13:06:58,198 - Exception causing close of session 0x1201aac1430000f due to java.io.IOException:
Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message