zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raj N <raj.cassan...@gmail.com>
Subject Re: forceSync=no
Date Sat, 16 Jun 2012 16:05:22 GMT
Mahadev, to answer your question,yes we get significantly better
performance with forceSync=no. Infact Patrick is probably right. You
probably ran the tests when the bug existed. It was one of my team members
who raised the forceSync=no not working bug.

Couple of more facts. We use ext4 filesystem (default options) on RHEL
2.6.18-238.el5(Notice its not el 6, so ext4 is back ported. ext3 is the
default on el5). We use 500GB SAS drives with BBWC(1 GB, DWC disabled). But
somehow still my performance with forceSync=yes is not the best. I have
been thinking it might be because of the default options in ext4 which
enables the barrier. The barrier essentially makes the BBWC useless. I
think I can safely disable the barrier since I have BBWC. I haven't tried
this out yet. But what do you guys think?

Thanks
-Raj

On Sat, Jun 16, 2012 at 7:25 AM, Flavio Junqueira <fpj@yahoo-inc.com> wrote:

> There are some corner cases that could lead you to lose data depending on
> your setting, even if forceSync is enabled. For example, if your disk write
> cache is enabled, then there are some sequences of events that could lead
> you to lose updates. With the disk write cache enabled, updates forced to
> disk could be lost locally, and depending on how many copies exist across
> servers, it may not be recovered.
>
> Options I'm aware of to get around this are to use write barriers,
> battery-backed raid controllers, or other solution that uses some form of
> non-volatile memory. I must also say that I'm not aware of any such a case
> happening with production use. We observed it in lab experiments, though.
>
> -Flavio
>
> On Jun 16, 2012, at 2:33 AM, Patrick Hunt wrote:
>
> > On Fri, Jun 15, 2012 at 12:45 PM, Raj N <raj.cassandra@gmail.com> wrote:
> >> Can zookeeper recover from a
> >> corrupt transaction log using existing snapshots and then replaying
> >> messages from its peers?
> >
> > A server will try to recover as best it can (using the snaps/logs it
> > has available), and then talk to the other servers in the quorum to
> > see if anyone else has a more recent committed change. In the case
> > where it doesn't it will download what's necessary to get in sync with
> > the new leader.
> >
> > What might have happened in your case is that you hit a bug, perhaps a
> > type of corruption that we don't handle successfully. e.g. see
> > ZOOKEEPER-1449
> >
> > Patrick
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message