asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From abdullah alamoudi <bamou...@gmail.com>
Subject Re: The solution to the sporadic connection refused exceptions
Date Mon, 24 Aug 2015 14:26:07 GMT
That works but it doesn't feel right doing it this way. I am going to fix
this one for good.

Cheers,
Abdullah.

On Mon, Aug 24, 2015 at 5:11 PM, Ian Maxon <imaxon@uci.edu> wrote:

> The way I assured liveness for the YARN installer was to try running "for
> $x in dataset Metadata.Dataset return $x" via the API. I just polled for a
> reasonable amount of time  (though honestly, thinking about it now, the
> correct parameter to use for the polling interval is the startup wait time
> in the parameters file :) ). It's not perfect, but it gives less false
> positives than just checking ps for processes that look like CCs/NCs.
>
> - Ian.
>
> On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi <bamousaa@gmail.com>
> wrote:
>
> > Now that I think about it. Maybe we should provide multiple ways to do
> > this. A polling mechanism to be used for arbitrary time and a pushing
> > mechanism on startup.
> > I am going to start implementation of this and will probably use RMI for
> > this task both ways (CC to InstallerDriver and InstallerDriver to CC).
> >
> > Cheers,
> > Abdullah.
> >
> > On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <bamousaa@gmail.com>
> > wrote:
> >
> > > So after further investigation, turned out our startup process just
> > starts
> > > the CC and NC processes and then make sure the processes are running
> and
> > if
> > > the processes were found to be running, it returns the state of the
> > cluster
> > > to be active and the subsequent test commands can start immediately.
> > >
> > > This means that the CC could've started but is not yet ready when we
> try
> > > to process the next command. To address this, we need a better way to
> > tell
> > > when the startup procedure has completed. we can do this by pushing (CC
> > > informs installer driver when the startup is complete) or polling (The
> > > installer driver needs to actually query the CC for the state of the
> > > cluster).
> > >
> > > I can do either way so let's vote. My vote goes to the pushing
> mechanism.
> > > Thoughts?
> > >
> > > On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <
> bamousaa@gmail.com>
> > > wrote:
> > >
> > >> This solution turned out to be incorrect. Actually, the test cases
> when
> > I
> > >> build after using the join method never fails but running an actual
> > asterix
> > >> instance never succeeds which is quite confusing.
> > >>
> > >> I also think that the startup script has a major bug where it might
> > >> returns before the startup is complete. More on this later......
> > >>
> > >> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <
> bamousaa@gmail.com>
> > >> wrote:
> > >>
> > >>> It is highly unlikely that it is related.
> > >>>
> > >>> Cheers,
> > >>> Abdullah.
> > >>>
> > >>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <chenli@gmail.com> wrote:
> > >>>
> > >>>> @Abdullah: Is this issue related to
> > >>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074? Ian and I
> plan
> > to
> > >>>> look into the details on Monday.
> > >>>>
> > >>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi <
> > bamousaa@gmail.com
> > >>>> >
> > >>>> wrote:
> > >>>>
> > >>>> > About 3-4 days ago, I was working on the addition of the
> filesystem
> > >>>> based
> > >>>> > feed adapter and it didn't take anytime to complete. However,
> when I
> > >>>> wanted
> > >>>> > to build and make sure all tests pass, I kept getting
> > >>>> ConnectionRefused
> > >>>> > errors which caused the installer tests to fail every now
and
> then.
> > >>>> >
> > >>>> > I knew the new change had nothing to do with this failure,
yet, I
> > >>>> couldn't
> > >>>> > direct my attention away from this bug (It just bothered me
so
> much
> > >>>> and I
> > >>>> > knew it needs to be resolved ASAP). After wasting countless
> hours, I
> > >>>> was
> > >>>> > finally able to figure out what was happening :-)
> > >>>> >
> > >>>> > In the startup routine, we start three Jetty web servers (Web
> > >>>> interface
> > >>>> > server, JSON API server, and Feed server). Sometime ago, we
used
> to
> > >>>> end the
> > >>>> > startup call before making sure the server.isStarted() method
> > returns
> > >>>> true
> > >>>> > on all servers. At that time, I introduced the
> waitUntilServerStarts
> > >>>> method
> > >>>> > to make sure we don't return before the servers are ready.
Turned
> > >>>> out, that
> > >>>> > was an incorrect way to handle this (We can blame stackoverflow
> for
> > >>>> this
> > >>>> > one!) and it is not enough that the server isStarted() returns
> true.
> > >>>> The
> > >>>> > correct way to do this is to call the server.join() method
after
> the
> > >>>> > server.start().
> > >>>> >
> > >>>> > See:
> > >>>> >
> > >>>>
> >
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
> > >>>> >
> > >>>> > This was equally satisfying as it was frustrating and you
are
> > welcome
> > >>>> for
> > >>>> > the future time I saved each of you :)
> > >>>> > --
> > >>>> > Amoudi, Abdullah.
> > >>>> >
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Amoudi, Abdullah.
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Amoudi, Abdullah.
> > >>
> > >
> > >
> > >
> > > --
> > > Amoudi, Abdullah.
> > >
> >
> >
> >
> > --
> > Amoudi, Abdullah.
> >
>



-- 
Amoudi, Abdullah.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message