asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raman Grover <ramangrove...@gmail.com>
Subject Re: The solution to the sporadic connection refused exceptions
Date Tue, 25 Aug 2015 00:28:36 GMT
Well, the state of an instance (and metadata including configuration) is
kept in Zookeeper instance that is accessible to Managix and CC. CC should
be able to set the state of the cluster in Zookeeper under the right znode
which can viewed by Managix.

There exists a communication channel for CC and Managix to share
information on state etc. I am not sure if we need another channel such as
RMI between Managix and CC.

Regards,
Raman



On Mon, Aug 24, 2015 at 12:58 PM, abdullah alamoudi <bamousaa@gmail.com>
wrote:

> Well, it depends on your definition of the boundaries of managix. What I
> did is that I added an RMI object in the InstallerDriver which basically
> listen for state changes from the cluster controller. This means some
> additional logic in the CCApplicationEntryPoint where after the CC is
> ready, it contacts the InstallerDriver using RMI and at that point only,
> the InstallerDriver can return to managix and tells it that the startup is
> complete.
>
> Not sure if this is the right way to do it but it definitely is better than
> what we currently have.
> Abdullah.
>
> On Mon, Aug 24, 2015 at 10:00 PM, Chris Hillery <chillery@hillery.land>
> wrote:
>
> > Hopefully the solution won't involve additional important logic inside
> > Managix itself?
> >
> > Ceej
> > aka Chris Hillery
> >
> > On Mon, Aug 24, 2015 at 7:26 AM, abdullah alamoudi <bamousaa@gmail.com>
> > wrote:
> >
> > > That works but it doesn't feel right doing it this way. I am going to
> fix
> > > this one for good.
> > >
> > > Cheers,
> > > Abdullah.
> > >
> > > On Mon, Aug 24, 2015 at 5:11 PM, Ian Maxon <imaxon@uci.edu> wrote:
> > >
> > > > The way I assured liveness for the YARN installer was to try running
> > "for
> > > > $x in dataset Metadata.Dataset return $x" via the API. I just polled
> > for
> > > a
> > > > reasonable amount of time  (though honestly, thinking about it now,
> the
> > > > correct parameter to use for the polling interval is the startup wait
> > > time
> > > > in the parameters file :) ). It's not perfect, but it gives less
> false
> > > > positives than just checking ps for processes that look like CCs/NCs.
> > > >
> > > > - Ian.
> > > >
> > > > On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi <
> bamousaa@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Now that I think about it. Maybe we should provide multiple ways
to
> > do
> > > > > this. A polling mechanism to be used for arbitrary time and a
> pushing
> > > > > mechanism on startup.
> > > > > I am going to start implementation of this and will probably use
> RMI
> > > for
> > > > > this task both ways (CC to InstallerDriver and InstallerDriver to
> > CC).
> > > > >
> > > > > Cheers,
> > > > > Abdullah.
> > > > >
> > > > > On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <
> > bamousaa@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > So after further investigation, turned out our startup process
> just
> > > > > starts
> > > > > > the CC and NC processes and then make sure the processes are
> > running
> > > > and
> > > > > if
> > > > > > the processes were found to be running, it returns the state
of
> the
> > > > > cluster
> > > > > > to be active and the subsequent test commands can start
> > immediately.
> > > > > >
> > > > > > This means that the CC could've started but is not yet ready
when
> > we
> > > > try
> > > > > > to process the next command. To address this, we need a better
> way
> > to
> > > > > tell
> > > > > > when the startup procedure has completed. we can do this by
> pushing
> > > (CC
> > > > > > informs installer driver when the startup is complete) or polling
> > > (The
> > > > > > installer driver needs to actually query the CC for the state
of
> > the
> > > > > > cluster).
> > > > > >
> > > > > > I can do either way so let's vote. My vote goes to the pushing
> > > > mechanism.
> > > > > > Thoughts?
> > > > > >
> > > > > > On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <
> > > > bamousaa@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> This solution turned out to be incorrect. Actually, the
test
> cases
> > > > when
> > > > > I
> > > > > >> build after using the join method never fails but running
an
> > actual
> > > > > asterix
> > > > > >> instance never succeeds which is quite confusing.
> > > > > >>
> > > > > >> I also think that the startup script has a major bug where
it
> > might
> > > > > >> returns before the startup is complete. More on this later......
> > > > > >>
> > > > > >> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <
> > > > bamousaa@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >>> It is highly unlikely that it is related.
> > > > > >>>
> > > > > >>> Cheers,
> > > > > >>> Abdullah.
> > > > > >>>
> > > > > >>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <chenli@gmail.com>
> > wrote:
> > > > > >>>
> > > > > >>>> @Abdullah: Is this issue related to
> > > > > >>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074?
Ian
> and I
> > > > plan
> > > > > to
> > > > > >>>> look into the details on Monday.
> > > > > >>>>
> > > > > >>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi
<
> > > > > bamousaa@gmail.com
> > > > > >>>> >
> > > > > >>>> wrote:
> > > > > >>>>
> > > > > >>>> > About 3-4 days ago, I was working on the addition
of the
> > > > filesystem
> > > > > >>>> based
> > > > > >>>> > feed adapter and it didn't take anytime to
complete.
> However,
> > > > when I
> > > > > >>>> wanted
> > > > > >>>> > to build and make sure all tests pass, I kept
getting
> > > > > >>>> ConnectionRefused
> > > > > >>>> > errors which caused the installer tests to
fail every now
> and
> > > > then.
> > > > > >>>> >
> > > > > >>>> > I knew the new change had nothing to do with
this failure,
> > yet,
> > > I
> > > > > >>>> couldn't
> > > > > >>>> > direct my attention away from this bug (It
just bothered me
> so
> > > > much
> > > > > >>>> and I
> > > > > >>>> > knew it needs to be resolved ASAP). After wasting
countless
> > > > hours, I
> > > > > >>>> was
> > > > > >>>> > finally able to figure out what was happening
:-)
> > > > > >>>> >
> > > > > >>>> > In the startup routine, we start three Jetty
web servers
> (Web
> > > > > >>>> interface
> > > > > >>>> > server, JSON API server, and Feed server).
Sometime ago, we
> > used
> > > > to
> > > > > >>>> end the
> > > > > >>>> > startup call before making sure the server.isStarted()
> method
> > > > > returns
> > > > > >>>> true
> > > > > >>>> > on all servers. At that time, I introduced
the
> > > > waitUntilServerStarts
> > > > > >>>> method
> > > > > >>>> > to make sure we don't return before the servers
are ready.
> > > Turned
> > > > > >>>> out, that
> > > > > >>>> > was an incorrect way to handle this (We can
blame
> > stackoverflow
> > > > for
> > > > > >>>> this
> > > > > >>>> > one!) and it is not enough that the server
isStarted()
> returns
> > > > true.
> > > > > >>>> The
> > > > > >>>> > correct way to do this is to call the server.join()
method
> > after
> > > > the
> > > > > >>>> > server.start().
> > > > > >>>> >
> > > > > >>>> > See:
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
> > > > > >>>> >
> > > > > >>>> > This was equally satisfying as it was frustrating
and you
> are
> > > > > welcome
> > > > > >>>> for
> > > > > >>>> > the future time I saved each of you :)
> > > > > >>>> > --
> > > > > >>>> > Amoudi, Abdullah.
> > > > > >>>> >
> > > > > >>>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> --
> > > > > >>> Amoudi, Abdullah.
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> Amoudi, Abdullah.
> > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Amoudi, Abdullah.
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Amoudi, Abdullah.
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Amoudi, Abdullah.
> > >
> >
>
>
>
> --
> Amoudi, Abdullah.
>



-- 
Raman

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message