hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: zookeeper on ec2
Date Wed, 02 Sep 2009 00:58:56 GMT
Can you enable verboseGC and look at the tenuring distribution and times for
GC?



On Tue, Sep 1, 2009 at 5:54 PM, Satish Bhatti <cthd2001@gmail.com> wrote:

> Parallel/Serial.
> infact@domU-12-31-39-06-3D-D1:/opt/ir/agent/infact-installs/aaa/infact$
> iostat
> Linux 2.6.18-xenU-ec2-v1.0 (domU-12-31-39-06-3D-D1)     09/01/2009
>  _x86_64_
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>          66.11    0.00    1.54    2.96   20.30    9.08
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda2            460.83       410.02     12458.18   40499322 1230554928
> sdc               0.00         0.00         0.00         96          0
> sda1              0.53         5.01         4.89     495338     482592
>
>
>
> On Tue, Sep 1, 2009 at 5:46 PM, Mahadev Konar <mahadev@yahoo-inc.com>
> wrote:
>
> > Hi satish,
> >  what GC are you using? Is it ConcurrentMarkSweep or Parallel/Serial?
> >
> >  Also, how is your disk usage on this machine? Can you check your iostat
> > numbers?
> >
> > Thanks
> > mahadev
> >
> >
> > On 9/1/09 5:15 PM, "Satish Bhatti" <cthd2001@gmail.com> wrote:
> >
> > > GC Time: 11.628 seconds on PS MarkSweep (389 collections)5 minutes on
> PS
> > > scavenge( 7,636 collections)
> > >
> > > It's been running for about 48 hours.
> > >
> > >
> > > On Tue, Sep 1, 2009 at 5:12 PM, Ted Dunning <ted.dunning@gmail.com>
> > wrote:
> > >
> > >> Do you have long GC delays?
> > >>
> > >> On Tue, Sep 1, 2009 at 4:51 PM, Satish Bhatti <cthd2001@gmail.com>
> > wrote:
> > >>
> > >>> Session timeout is 30 seconds.
> > >>>
> > >>> On Tue, Sep 1, 2009 at 4:26 PM, Patrick Hunt <phunt@apache.org>
> wrote:
> > >>>
> > >>>> What is your client timeout? It may be too low.
> > >>>>
> > >>>> also see this section on handling recoverable errors:
> > >>>> http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling
> > >>>>
> > >>>> connection loss in particular needs special care since:
> > >>>> "When a ZooKeeper client loses a connection to the ZooKeeper server
> > >> there
> > >>>> may be some requests in flight; we don't know where they were in
> their
> > >>>> flight at the time of the connection loss. "
> > >>>>
> > >>>> Patrick
> > >>>>
> > >>>>
> > >>>> Satish Bhatti wrote:
> > >>>>
> > >>>>> I have recently started running on EC2 and am seeing quite
a few
> > >>>>> ConnectionLoss exceptions.  Should I just catch these and retry?
> > >>  Since
> > >>> I
> > >>>>> assume that eventually, if the shit truly hits the fan, I will
get
> a
> > >>>>> SessionExpired?
> > >>>>> Satish
> > >>>>>
> > >>>>> On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <
> ted.dunning@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>  We have used EC2 quite a bit for ZK.
> > >>>>>>
> > >>>>>> The basic lessons that I have learned include:
> > >>>>>>
> > >>>>>> a) EC2's biggest advantage after scaling and elasticity
was
> > >> conformity
> > >>> of
> > >>>>>> configuration.  Since you are bringing machines up and
down all
> the
> > >>> time,
> > >>>>>> they begin to act more like programs and you wind up with
boot
> > >> scripts
> > >>>>>> that
> > >>>>>> give you a very predictable environment.  Nice.
> > >>>>>>
> > >>>>>> b) EC2 interconnect has a lot more going on than in a dedicated
> > VLAN.
> > >>>>>>  That
> > >>>>>> can make the ZK servers appear a bit less connected.  You
have to
> > >> plan
> > >>>>>> for
> > >>>>>> ConnectionLoss events.
> > >>>>>>
> > >>>>>> c) for highest reliability, I switched to large instances.
 On
> > >>>>>> reflection,
> > >>>>>> I
> > >>>>>> think that was helpful, but less important than I thought
at the
> > >> time.
> > >>>>>>
> > >>>>>> d) increasing and decreasing cluster size is nearly painless
and
> is
> > >>>>>> easily
> > >>>>>> scriptable.  To decrease, do a rolling update on the survivors
to
> > >>> update
> > >>>>>> their configuration.  Then take down the instance you want
to
> lose.
> > >>  To
> > >>>>>> increase, do a rolling update starting with the new instances
to
> > >> update
> > >>>>>> the
> > >>>>>> configuration to include all of the machines.  The rolling
update
> > >>> should
> > >>>>>> bounce each ZK with several seconds between each bounce.
>  Rescaling
> > >> the
> > >>>>>> cluster takes less than a minute which makes it comparable
to EC2
> > >>>>>> instance
> > >>>>>> boot time (about 30 seconds for the Alestic ubuntu instance
that
> we
> > >>> used
> > >>>>>> plus about 20 seconds for additional configuration).
> > >>>>>>
> > >>>>>> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <david.graf@28msec.com
> >
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>  Hello
> > >>>>>>>
> > >>>>>>> I wanna set up a zookeeper ensemble on amazon's ec2
service. In
> my
> > >>>>>>>
> > >>>>>> system,
> > >>>>>>
> > >>>>>>> zookeeper is used to run a locking service and to generate
unique
> > >>> id's.
> > >>>>>>> Currently, for testing purposes, I am only running
one instance.
> > >> Now,
> > >>> I
> > >>>>>>>
> > >>>>>> need
> > >>>>>>
> > >>>>>>> to set up an ensemble to protect my system against
crashes.
> > >>>>>>> The ec2 services has some differences to a normal server
farm.
> E.g.
> > >>> the
> > >>>>>>> data saved on the file system of an ec2 instance is
lost if the
> > >>> instance
> > >>>>>>> crashes. In the documentation of zookeeper, I have
read that
> > >> zookeeper
> > >>>>>>>
> > >>>>>> saves
> > >>>>>>
> > >>>>>>> snapshots of the in-memory data in the file system.
Is that
> needed
> > >> for
> > >>>>>>> recovery? Logically, it would be much easier for me
if this is
> not
> > >> the
> > >>>>>>>
> > >>>>>> case.
> > >>>>>>
> > >>>>>>> Additionally, ec2 brings the advantage that serves
can be switch
> on
> > >>> and
> > >>>>>>>
> > >>>>>> off
> > >>>>>>
> > >>>>>>> dynamically dependent on the load, traffic, etc. Can
this
> advantage
> > >> be
> > >>>>>>> utilized for a zookeeper ensemble? Is it possible to
add a
> > zookeeper
> > >>>>>>>
> > >>>>>> server
> > >>>>>>
> > >>>>>>> dynamically to an ensemble? E.g. dependent on the in-memory
load?
> > >>>>>>>
> > >>>>>>> David
> > >>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Ted Dunning, CTO
> > >> DeepDyve
> > >>
> >
> >
>



-- 
Ted Dunning, CTO
DeepDyve

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message