asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From abdullah alamoudi <bamou...@gmail.com>
Subject Re: The solution to the sporadic connection refused exceptions
Date Tue, 25 Aug 2015 11:10:08 GMT
I don't think that is there yet but the intention is to have it at some
point in the future.

Cheers,
Abdullah.

On Tue, Aug 25, 2015 at 12:38 PM, Chris Hillery <chillery@hillery.land>
wrote:

> Very interesting, thank you. Can you point out a couple places in the code
> where some of this logic is kept? Specifically where "CC can update this
> information and notify Managix" sounds interesting...
>
> Ceej
> aka Chris Hillery
>
> On Tue, Aug 25, 2015 at 12:49 AM, Raman Grover <ramangrover29@gmail.com>
> wrote:
>
> > > , and what code is
> > > responsible for keeping it up-to-date?
> > >
> > Apparently, no one is :-)
> >
> > The information for an AsterixDB instance is "lazily" refreshed when a
> > management operation is invoked (using managix set of commands) or an
> > explicit describe command is invoked.
> > Between the time t1 (when state of an AsterixDB instance changes, say due
> > to NC failure) and t2 (when  a management operation is invoked), the
> > information about the AsterixDB instance inside Zookeeper remains stale.
> CC
> > can update this information and notify Managix; this way Managix realizes
> > the changed state as soon as it has occurred. This can be particularly
> > useful when showing on a management console the up-to-date state of an
> > instance in real time or having Managix respond to an event.
> >
> > Regards,
> > Raman
> >
> > ---------- Forwarded message ----------
> > From: abdullah alamoudi <bamousaa@gmail.com>
> > Date: Tue, Aug 25, 2015 at 12:27 AM
> > Subject: Re: The solution to the sporadic connection refused exceptions
> > To: dev@asterixdb.incubator.apache.org
> >
> >
> > On Tue, Aug 25, 2015 at 3:40 AM, Chris Hillery <chillery@hillery.land>
> > wrote:
> >
> > > Perhaps an aside, but: exactly what is kept in Zookeeper
> >
> >
> > A serialized instance of edu.uci.ics.asterix.event.model.AsterixInstance
> >
> >
> > > , and what code is
> > > responsible for keeping it up-to-date?
> > >
> > Apparently, no one is :-)
> >
> >
> > >
> > > Ceej
> > >
> > > On Mon, Aug 24, 2015 at 5:28 PM, Raman Grover <ramangrover29@gmail.com
> >
> > > wrote:
> > >
> > > > Well, the state of an instance (and metadata including configuration)
> > is
> > > > kept in Zookeeper instance that is accessible to Managix and CC. CC
> > > should
> > > > be able to set the state of the cluster in Zookeeper under the right
> > > znode
> > > > which can viewed by Managix.
> > > >
> > > > There exists a communication channel for CC and Managix to share
> > > > information on state etc. I am not sure if we need another channel
> such
> > > as
> > > > RMI between Managix and CC.
> > > >
> > > > Regards,
> > > > Raman
> > > >
> > > >
> > > >
> > > > On Mon, Aug 24, 2015 at 12:58 PM, abdullah alamoudi <
> > bamousaa@gmail.com>
> > > > wrote:
> > > >
> > > > > Well, it depends on your definition of the boundaries of managix.
> > What
> > > I
> > > > > did is that I added an RMI object in the InstallerDriver which
> > > basically
> > > > > listen for state changes from the cluster controller. This means
> some
> > > > > additional logic in the CCApplicationEntryPoint where after the CC
> is
> > > > > ready, it contacts the InstallerDriver using RMI and at that point
> > > only,
> > > > > the InstallerDriver can return to managix and tells it that the
> > startup
> > > > is
> > > > > complete.
> > > > >
> > > > > Not sure if this is the right way to do it but it definitely is
> > better
> > > > than
> > > > > what we currently have.
> > > > > Abdullah.
> > > > >
> > > > > On Mon, Aug 24, 2015 at 10:00 PM, Chris Hillery
> > <chillery@hillery.land
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hopefully the solution won't involve additional important logic
> > > inside
> > > > > > Managix itself?
> > > > > >
> > > > > > Ceej
> > > > > > aka Chris Hillery
> > > > > >
> > > > > > On Mon, Aug 24, 2015 at 7:26 AM, abdullah alamoudi <
> > > bamousaa@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > That works but it doesn't feel right doing it this way.
I am
> > going
> > > to
> > > > > fix
> > > > > > > this one for good.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Abdullah.
> > > > > > >
> > > > > > > On Mon, Aug 24, 2015 at 5:11 PM, Ian Maxon <imaxon@uci.edu>
> > wrote:
> > > > > > >
> > > > > > > > The way I assured liveness for the YARN installer
was to try
> > > > running
> > > > > > "for
> > > > > > > > $x in dataset Metadata.Dataset return $x" via the
API. I just
> > > > polled
> > > > > > for
> > > > > > > a
> > > > > > > > reasonable amount of time  (though honestly, thinking
about
> it
> > > now,
> > > > > the
> > > > > > > > correct parameter to use for the polling interval
is the
> > startup
> > > > wait
> > > > > > > time
> > > > > > > > in the parameters file :) ). It's not perfect, but
it gives
> > less
> > > > > false
> > > > > > > > positives than just checking ps for processes that
look like
> > > > CCs/NCs.
> > > > > > > >
> > > > > > > > - Ian.
> > > > > > > >
> > > > > > > > On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi
<
> > > > > bamousaa@gmail.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Now that I think about it. Maybe we should provide
multiple
> > > ways
> > > > to
> > > > > > do
> > > > > > > > > this. A polling mechanism to be used for arbitrary
time
> and a
> > > > > pushing
> > > > > > > > > mechanism on startup.
> > > > > > > > > I am going to start implementation of this and
will
> probably
> > > use
> > > > > RMI
> > > > > > > for
> > > > > > > > > this task both ways (CC to InstallerDriver and
> > InstallerDriver
> > > to
> > > > > > CC).
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Abdullah.
> > > > > > > > >
> > > > > > > > > On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi
<
> > > > > > bamousaa@gmail.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > So after further investigation, turned out
our startup
> > > process
> > > > > just
> > > > > > > > > starts
> > > > > > > > > > the CC and NC processes and then make sure
the processes
> > are
> > > > > > running
> > > > > > > > and
> > > > > > > > > if
> > > > > > > > > > the processes were found to be running,
it returns the
> > state
> > > of
> > > > > the
> > > > > > > > > cluster
> > > > > > > > > > to be active and the subsequent test commands
can start
> > > > > > immediately.
> > > > > > > > > >
> > > > > > > > > > This means that the CC could've started
but is not yet
> > ready
> > > > when
> > > > > > we
> > > > > > > > try
> > > > > > > > > > to process the next command. To address
this, we need a
> > > better
> > > > > way
> > > > > > to
> > > > > > > > > tell
> > > > > > > > > > when the startup procedure has completed.
we can do this
> by
> > > > > pushing
> > > > > > > (CC
> > > > > > > > > > informs installer driver when the startup
is complete) or
> > > > polling
> > > > > > > (The
> > > > > > > > > > installer driver needs to actually query
the CC for the
> > state
> > > > of
> > > > > > the
> > > > > > > > > > cluster).
> > > > > > > > > >
> > > > > > > > > > I can do either way so let's vote. My vote
goes to the
> > > pushing
> > > > > > > > mechanism.
> > > > > > > > > > Thoughts?
> > > > > > > > > >
> > > > > > > > > > On Mon, Aug 24, 2015 at 10:15 AM, abdullah
alamoudi <
> > > > > > > > bamousaa@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> This solution turned out to be incorrect.
Actually, the
> > test
> > > > > cases
> > > > > > > > when
> > > > > > > > > I
> > > > > > > > > >> build after using the join method never
fails but
> running
> > an
> > > > > > actual
> > > > > > > > > asterix
> > > > > > > > > >> instance never succeeds which is quite
confusing.
> > > > > > > > > >>
> > > > > > > > > >> I also think that the startup script
has a major bug
> where
> > > it
> > > > > > might
> > > > > > > > > >> returns before the startup is complete.
More on this
> > > > later......
> > > > > > > > > >>
> > > > > > > > > >> On Mon, Aug 24, 2015 at 7:48 AM, abdullah
alamoudi <
> > > > > > > > bamousaa@gmail.com>
> > > > > > > > > >> wrote:
> > > > > > > > > >>
> > > > > > > > > >>> It is highly unlikely that it is
related.
> > > > > > > > > >>>
> > > > > > > > > >>> Cheers,
> > > > > > > > > >>> Abdullah.
> > > > > > > > > >>>
> > > > > > > > > >>> On Mon, Aug 24, 2015 at 5:45 AM,
Chen Li <
> > chenli@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > > >>>
> > > > > > > > > >>>> @Abdullah: Is this issue related
to
> > > > > > > > > >>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074?
> > Ian
> > > > > and I
> > > > > > > > plan
> > > > > > > > > to
> > > > > > > > > >>>> look into the details on Monday.
> > > > > > > > > >>>>
> > > > > > > > > >>>> On Sun, Aug 23, 2015 at 10:08
AM, abdullah alamoudi <
> > > > > > > > > bamousaa@gmail.com
> > > > > > > > > >>>> >
> > > > > > > > > >>>> wrote:
> > > > > > > > > >>>>
> > > > > > > > > >>>> > About 3-4 days ago, I was
working on the addition of
> > the
> > > > > > > > filesystem
> > > > > > > > > >>>> based
> > > > > > > > > >>>> > feed adapter and it didn't
take anytime to complete.
> > > > > However,
> > > > > > > > when I
> > > > > > > > > >>>> wanted
> > > > > > > > > >>>> > to build and make sure
all tests pass, I kept
> getting
> > > > > > > > > >>>> ConnectionRefused
> > > > > > > > > >>>> > errors which caused the
installer tests to fail
> every
> > > now
> > > > > and
> > > > > > > > then.
> > > > > > > > > >>>> >
> > > > > > > > > >>>> > I knew the new change had
nothing to do with this
> > > failure,
> > > > > > yet,
> > > > > > > I
> > > > > > > > > >>>> couldn't
> > > > > > > > > >>>> > direct my attention away
from this bug (It just
> > bothered
> > > > me
> > > > > so
> > > > > > > > much
> > > > > > > > > >>>> and I
> > > > > > > > > >>>> > knew it needs to be resolved
ASAP). After wasting
> > > > countless
> > > > > > > > hours, I
> > > > > > > > > >>>> was
> > > > > > > > > >>>> > finally able to figure
out what was happening :-)
> > > > > > > > > >>>> >
> > > > > > > > > >>>> > In the startup routine,
we start three Jetty web
> > servers
> > > > > (Web
> > > > > > > > > >>>> interface
> > > > > > > > > >>>> > server, JSON API server,
and Feed server). Sometime
> > ago,
> > > > we
> > > > > > used
> > > > > > > > to
> > > > > > > > > >>>> end the
> > > > > > > > > >>>> > startup call before making
sure the
> server.isStarted()
> > > > > method
> > > > > > > > > returns
> > > > > > > > > >>>> true
> > > > > > > > > >>>> > on all servers. At that
time, I introduced the
> > > > > > > > waitUntilServerStarts
> > > > > > > > > >>>> method
> > > > > > > > > >>>> > to make sure we don't return
before the servers are
> > > ready.
> > > > > > > Turned
> > > > > > > > > >>>> out, that
> > > > > > > > > >>>> > was an incorrect way to
handle this (We can blame
> > > > > > stackoverflow
> > > > > > > > for
> > > > > > > > > >>>> this
> > > > > > > > > >>>> > one!) and it is not enough
that the server
> isStarted()
> > > > > returns
> > > > > > > > true.
> > > > > > > > > >>>> The
> > > > > > > > > >>>> > correct way to do this
is to call the server.join()
> > > method
> > > > > > after
> > > > > > > > the
> > > > > > > > > >>>> > server.start().
> > > > > > > > > >>>> >
> > > > > > > > > >>>> > See:
> > > > > > > > > >>>> >
> > > > > > > > > >>>>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
> > > > > > > > > >>>> >
> > > > > > > > > >>>> > This was equally satisfying
as it was frustrating
> and
> > > you
> > > > > are
> > > > > > > > > welcome
> > > > > > > > > >>>> for
> > > > > > > > > >>>> > the future time I saved
each of you :)
> > > > > > > > > >>>> > --
> > > > > > > > > >>>> > Amoudi, Abdullah.
> > > > > > > > > >>>> >
> > > > > > > > > >>>>
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>> --
> > > > > > > > > >>> Amoudi, Abdullah.
> > > > > > > > > >>>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> --
> > > > > > > > > >> Amoudi, Abdullah.
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Amoudi, Abdullah.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Amoudi, Abdullah.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Amoudi, Abdullah.
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Amoudi, Abdullah.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Raman
> > > >
> > >
> >
> >
> >
> > --
> > Amoudi, Abdullah.
> >
> >
> >
> > --
> > Raman
> >
>



-- 
Amoudi, Abdullah.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message