hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Satish Bhatti <cthd2...@gmail.com>
Subject Re: zookeeper on ec2
Date Wed, 02 Sep 2009 00:54:19 GMT
Parallel/Serial.
infact@domU-12-31-39-06-3D-D1:/opt/ir/agent/infact-installs/aaa/infact$
iostat
Linux 2.6.18-xenU-ec2-v1.0 (domU-12-31-39-06-3D-D1)     09/01/2009
 _x86_64_

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          66.11    0.00    1.54    2.96   20.30    9.08

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda2            460.83       410.02     12458.18   40499322 1230554928
sdc               0.00         0.00         0.00         96          0
sda1              0.53         5.01         4.89     495338     482592



On Tue, Sep 1, 2009 at 5:46 PM, Mahadev Konar <mahadev@yahoo-inc.com> wrote:

> Hi satish,
>  what GC are you using? Is it ConcurrentMarkSweep or Parallel/Serial?
>
>  Also, how is your disk usage on this machine? Can you check your iostat
> numbers?
>
> Thanks
> mahadev
>
>
> On 9/1/09 5:15 PM, "Satish Bhatti" <cthd2001@gmail.com> wrote:
>
> > GC Time: 11.628 seconds on PS MarkSweep (389 collections)5 minutes on PS
> > scavenge( 7,636 collections)
> >
> > It's been running for about 48 hours.
> >
> >
> > On Tue, Sep 1, 2009 at 5:12 PM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
> >
> >> Do you have long GC delays?
> >>
> >> On Tue, Sep 1, 2009 at 4:51 PM, Satish Bhatti <cthd2001@gmail.com>
> wrote:
> >>
> >>> Session timeout is 30 seconds.
> >>>
> >>> On Tue, Sep 1, 2009 at 4:26 PM, Patrick Hunt <phunt@apache.org> wrote:
> >>>
> >>>> What is your client timeout? It may be too low.
> >>>>
> >>>> also see this section on handling recoverable errors:
> >>>> http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling
> >>>>
> >>>> connection loss in particular needs special care since:
> >>>> "When a ZooKeeper client loses a connection to the ZooKeeper server
> >> there
> >>>> may be some requests in flight; we don't know where they were in their
> >>>> flight at the time of the connection loss. "
> >>>>
> >>>> Patrick
> >>>>
> >>>>
> >>>> Satish Bhatti wrote:
> >>>>
> >>>>> I have recently started running on EC2 and am seeing quite a few
> >>>>> ConnectionLoss exceptions.  Should I just catch these and retry?
> >>  Since
> >>> I
> >>>>> assume that eventually, if the shit truly hits the fan, I will get
a
> >>>>> SessionExpired?
> >>>>> Satish
> >>>>>
> >>>>> On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <ted.dunning@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>  We have used EC2 quite a bit for ZK.
> >>>>>>
> >>>>>> The basic lessons that I have learned include:
> >>>>>>
> >>>>>> a) EC2's biggest advantage after scaling and elasticity was
> >> conformity
> >>> of
> >>>>>> configuration.  Since you are bringing machines up and down
all the
> >>> time,
> >>>>>> they begin to act more like programs and you wind up with boot
> >> scripts
> >>>>>> that
> >>>>>> give you a very predictable environment.  Nice.
> >>>>>>
> >>>>>> b) EC2 interconnect has a lot more going on than in a dedicated
> VLAN.
> >>>>>>  That
> >>>>>> can make the ZK servers appear a bit less connected.  You have
to
> >> plan
> >>>>>> for
> >>>>>> ConnectionLoss events.
> >>>>>>
> >>>>>> c) for highest reliability, I switched to large instances. 
On
> >>>>>> reflection,
> >>>>>> I
> >>>>>> think that was helpful, but less important than I thought at
the
> >> time.
> >>>>>>
> >>>>>> d) increasing and decreasing cluster size is nearly painless
and is
> >>>>>> easily
> >>>>>> scriptable.  To decrease, do a rolling update on the survivors
to
> >>> update
> >>>>>> their configuration.  Then take down the instance you want to
lose.
> >>  To
> >>>>>> increase, do a rolling update starting with the new instances
to
> >> update
> >>>>>> the
> >>>>>> configuration to include all of the machines.  The rolling update
> >>> should
> >>>>>> bounce each ZK with several seconds between each bounce.  Rescaling
> >> the
> >>>>>> cluster takes less than a minute which makes it comparable to
EC2
> >>>>>> instance
> >>>>>> boot time (about 30 seconds for the Alestic ubuntu instance
that we
> >>> used
> >>>>>> plus about 20 seconds for additional configuration).
> >>>>>>
> >>>>>> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <david.graf@28msec.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>  Hello
> >>>>>>>
> >>>>>>> I wanna set up a zookeeper ensemble on amazon's ec2 service.
In my
> >>>>>>>
> >>>>>> system,
> >>>>>>
> >>>>>>> zookeeper is used to run a locking service and to generate
unique
> >>> id's.
> >>>>>>> Currently, for testing purposes, I am only running one instance.
> >> Now,
> >>> I
> >>>>>>>
> >>>>>> need
> >>>>>>
> >>>>>>> to set up an ensemble to protect my system against crashes.
> >>>>>>> The ec2 services has some differences to a normal server
farm. E.g.
> >>> the
> >>>>>>> data saved on the file system of an ec2 instance is lost
if the
> >>> instance
> >>>>>>> crashes. In the documentation of zookeeper, I have read
that
> >> zookeeper
> >>>>>>>
> >>>>>> saves
> >>>>>>
> >>>>>>> snapshots of the in-memory data in the file system. Is that
needed
> >> for
> >>>>>>> recovery? Logically, it would be much easier for me if this
is not
> >> the
> >>>>>>>
> >>>>>> case.
> >>>>>>
> >>>>>>> Additionally, ec2 brings the advantage that serves can be
switch on
> >>> and
> >>>>>>>
> >>>>>> off
> >>>>>>
> >>>>>>> dynamically dependent on the load, traffic, etc. Can this
advantage
> >> be
> >>>>>>> utilized for a zookeeper ensemble? Is it possible to add
a
> zookeeper
> >>>>>>>
> >>>>>> server
> >>>>>>
> >>>>>>> dynamically to an ensemble? E.g. dependent on the in-memory
load?
> >>>>>>>
> >>>>>>> David
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Ted Dunning, CTO
> >> DeepDyve
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message