hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: znode cversion decreasing?
Date Mon, 12 Apr 2010 23:22:46 GMT

On 04/12/2010 03:58 PM, Kevin Webb wrote:
> On Mon, 12 Apr 2010 15:09:20 -0700
> Patrick Hunt<phunt@apache.org>  wrote:
>
>> We did have a case where the user setup 3 servers, each was
>> standalone. :-) Doesn't look like that's the problem here though
>> given you only specify 1 server in the connect string (although as
>> mahadev mentioned you don't need to worry about that aspect).
>
> They're definitely not standalone.  Here's the server config:
>
> # The number of milliseconds of each tick
> tickTime=2000
> # The number of ticks that the initial
> # synchronization phase can take
> initLimit=5
> # The number of ticks that can pass between
> # sending a request and getting an acknowledgement
> syncLimit=2
> # the directory where the snapshot is stored.
> dataDir=/home/pl_drl/zookeeper-3.2.2/data
> # the port at which the clients will connect
> clientPort=2181
> server.1=<hostname 1>:2888:3888
> server.2=<hostname 2>:2888:3888
> server.3=<hostname 3>:2888:3888
>

What's the ping time btw colos? 2sec tickTime and esp the initLimit and 
syncLimit are pretty low. You are allowing for only 4 seconds to d/l the 
data repository to a remote server. Even in-colo we typically use a 
higher value... but you many not want to change until we can reproduce 
this. You probably want a 4 sec tickTime and 60/40sec (so settings of 
15/10) for the init/sync limits (something like that, depending on 
latencies/bandwidth you see)

>
>> After it goes 7->11->9, does it ever go back to 11 or just 9?
>
> It actually does this:
> 7->7->11->9->9->12->14 ... (proceeds normally from here)
>

Hrm, that's very weird.

>> It would be good to capture the server log files (all 3) when this
>> happens next time. Please provide those as well, would be critical
>> for discovering this. In particular not many users are running
>> cross-colo clusters.
>
> I'll be sure to save these next time.  I thought I had them for this
> run, sorry.
>

NP. As I mentioned creating a JIRA would be a good idea. Very DRY.

>> If you can provide the config files too that will be useful.
>>
>> What version of java/OS is being used?
>
> I'm running on PlanetLab, which is based on Fedora 8 (very old).
> uname says: Linux 2.6.22.19-vs2.3.0.34.39.planetlab #1 SMP Tue
> Jun 30 09:32:05 UTC 2009 i686 i686 i386 GNU/Linux
>

Hrm...

> java -version says:
> java version "1.7.0"
> IcedTea Runtime Environment (build 1.7.0-b21)
> IcedTea Client VM (build 1.7.0-b21, mixed mode)
>

Well we don't support 1.7 vms yet, but that's not to say that would 
cause the issue. Really once we see the server logs we should get more 
insight.

The only thing I could see with the os/java would be significant 
differences in thread/networking timing that we don't typically see with 
new os's and 1.6 vms...

>> Might be a good time to create a JIRA, attach all this to the JIRA so
>> that you don't have to repeat. :-)
>
> I'll do that (including server logs) next time I see it happen.

Good.

Patrick

Mime
View raw message