asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Maxon <ima...@uci.edu>
Subject Re: The solution to the sporadic connection refused exceptions
Date Mon, 24 Aug 2015 14:11:51 GMT
The way I assured liveness for the YARN installer was to try running "for
$x in dataset Metadata.Dataset return $x" via the API. I just polled for a
reasonable amount of time  (though honestly, thinking about it now, the
correct parameter to use for the polling interval is the startup wait time
in the parameters file :) ). It's not perfect, but it gives less false
positives than just checking ps for processes that look like CCs/NCs.

- Ian.

On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi <bamousaa@gmail.com>
wrote:

> Now that I think about it. Maybe we should provide multiple ways to do
> this. A polling mechanism to be used for arbitrary time and a pushing
> mechanism on startup.
> I am going to start implementation of this and will probably use RMI for
> this task both ways (CC to InstallerDriver and InstallerDriver to CC).
>
> Cheers,
> Abdullah.
>
> On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <bamousaa@gmail.com>
> wrote:
>
> > So after further investigation, turned out our startup process just
> starts
> > the CC and NC processes and then make sure the processes are running and
> if
> > the processes were found to be running, it returns the state of the
> cluster
> > to be active and the subsequent test commands can start immediately.
> >
> > This means that the CC could've started but is not yet ready when we try
> > to process the next command. To address this, we need a better way to
> tell
> > when the startup procedure has completed. we can do this by pushing (CC
> > informs installer driver when the startup is complete) or polling (The
> > installer driver needs to actually query the CC for the state of the
> > cluster).
> >
> > I can do either way so let's vote. My vote goes to the pushing mechanism.
> > Thoughts?
> >
> > On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <bamousaa@gmail.com>
> > wrote:
> >
> >> This solution turned out to be incorrect. Actually, the test cases when
> I
> >> build after using the join method never fails but running an actual
> asterix
> >> instance never succeeds which is quite confusing.
> >>
> >> I also think that the startup script has a major bug where it might
> >> returns before the startup is complete. More on this later......
> >>
> >> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <bamousaa@gmail.com>
> >> wrote:
> >>
> >>> It is highly unlikely that it is related.
> >>>
> >>> Cheers,
> >>> Abdullah.
> >>>
> >>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <chenli@gmail.com> wrote:
> >>>
> >>>> @Abdullah: Is this issue related to
> >>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074? Ian and I plan
> to
> >>>> look into the details on Monday.
> >>>>
> >>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi <
> bamousaa@gmail.com
> >>>> >
> >>>> wrote:
> >>>>
> >>>> > About 3-4 days ago, I was working on the addition of the filesystem
> >>>> based
> >>>> > feed adapter and it didn't take anytime to complete. However, when
I
> >>>> wanted
> >>>> > to build and make sure all tests pass, I kept getting
> >>>> ConnectionRefused
> >>>> > errors which caused the installer tests to fail every now and then.
> >>>> >
> >>>> > I knew the new change had nothing to do with this failure, yet,
I
> >>>> couldn't
> >>>> > direct my attention away from this bug (It just bothered me so
much
> >>>> and I
> >>>> > knew it needs to be resolved ASAP). After wasting countless hours,
I
> >>>> was
> >>>> > finally able to figure out what was happening :-)
> >>>> >
> >>>> > In the startup routine, we start three Jetty web servers (Web
> >>>> interface
> >>>> > server, JSON API server, and Feed server). Sometime ago, we used
to
> >>>> end the
> >>>> > startup call before making sure the server.isStarted() method
> returns
> >>>> true
> >>>> > on all servers. At that time, I introduced the waitUntilServerStarts
> >>>> method
> >>>> > to make sure we don't return before the servers are ready. Turned
> >>>> out, that
> >>>> > was an incorrect way to handle this (We can blame stackoverflow
for
> >>>> this
> >>>> > one!) and it is not enough that the server isStarted() returns
true.
> >>>> The
> >>>> > correct way to do this is to call the server.join() method after
the
> >>>> > server.start().
> >>>> >
> >>>> > See:
> >>>> >
> >>>>
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
> >>>> >
> >>>> > This was equally satisfying as it was frustrating and you are
> welcome
> >>>> for
> >>>> > the future time I saved each of you :)
> >>>> > --
> >>>> > Amoudi, Abdullah.
> >>>> >
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Amoudi, Abdullah.
> >>>
> >>
> >>
> >>
> >> --
> >> Amoudi, Abdullah.
> >>
> >
> >
> >
> > --
> > Amoudi, Abdullah.
> >
>
>
>
> --
> Amoudi, Abdullah.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message