asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From abdullah alamoudi <bamou...@gmail.com>
Subject Re: The solution to the sporadic connection refused exceptions
Date Tue, 25 Aug 2015 04:47:14 GMT
@Raman,
I will look into doing it with Zookeeper.

Is there a way to notify Managix once the cluster state has been updated in
Zookeeper? or would Managix have to poll and check the state?

Cheers,
Abdullah.

On Tue, Aug 25, 2015 at 3:28 AM, Raman Grover <ramangrover29@gmail.com>
wrote:

> Well, the state of an instance (and metadata including configuration) is
> kept in Zookeeper instance that is accessible to Managix and CC. CC should
> be able to set the state of the cluster in Zookeeper under the right znode
> which can viewed by Managix.
>
> There exists a communication channel for CC and Managix to share
> information on state etc. I am not sure if we need another channel such as
> RMI between Managix and CC.
>
> Regards,
> Raman
>
>
>
> On Mon, Aug 24, 2015 at 12:58 PM, abdullah alamoudi <bamousaa@gmail.com>
> wrote:
>
> > Well, it depends on your definition of the boundaries of managix. What I
> > did is that I added an RMI object in the InstallerDriver which basically
> > listen for state changes from the cluster controller. This means some
> > additional logic in the CCApplicationEntryPoint where after the CC is
> > ready, it contacts the InstallerDriver using RMI and at that point only,
> > the InstallerDriver can return to managix and tells it that the startup
> is
> > complete.
> >
> > Not sure if this is the right way to do it but it definitely is better
> than
> > what we currently have.
> > Abdullah.
> >
> > On Mon, Aug 24, 2015 at 10:00 PM, Chris Hillery <chillery@hillery.land>
> > wrote:
> >
> > > Hopefully the solution won't involve additional important logic inside
> > > Managix itself?
> > >
> > > Ceej
> > > aka Chris Hillery
> > >
> > > On Mon, Aug 24, 2015 at 7:26 AM, abdullah alamoudi <bamousaa@gmail.com
> >
> > > wrote:
> > >
> > > > That works but it doesn't feel right doing it this way. I am going to
> > fix
> > > > this one for good.
> > > >
> > > > Cheers,
> > > > Abdullah.
> > > >
> > > > On Mon, Aug 24, 2015 at 5:11 PM, Ian Maxon <imaxon@uci.edu> wrote:
> > > >
> > > > > The way I assured liveness for the YARN installer was to try
> running
> > > "for
> > > > > $x in dataset Metadata.Dataset return $x" via the API. I just
> polled
> > > for
> > > > a
> > > > > reasonable amount of time  (though honestly, thinking about it now,
> > the
> > > > > correct parameter to use for the polling interval is the startup
> wait
> > > > time
> > > > > in the parameters file :) ). It's not perfect, but it gives less
> > false
> > > > > positives than just checking ps for processes that look like
> CCs/NCs.
> > > > >
> > > > > - Ian.
> > > > >
> > > > > On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi <
> > bamousaa@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Now that I think about it. Maybe we should provide multiple
ways
> to
> > > do
> > > > > > this. A polling mechanism to be used for arbitrary time and
a
> > pushing
> > > > > > mechanism on startup.
> > > > > > I am going to start implementation of this and will probably
use
> > RMI
> > > > for
> > > > > > this task both ways (CC to InstallerDriver and InstallerDriver
to
> > > CC).
> > > > > >
> > > > > > Cheers,
> > > > > > Abdullah.
> > > > > >
> > > > > > On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <
> > > bamousaa@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > So after further investigation, turned out our startup
process
> > just
> > > > > > starts
> > > > > > > the CC and NC processes and then make sure the processes
are
> > > running
> > > > > and
> > > > > > if
> > > > > > > the processes were found to be running, it returns the
state of
> > the
> > > > > > cluster
> > > > > > > to be active and the subsequent test commands can start
> > > immediately.
> > > > > > >
> > > > > > > This means that the CC could've started but is not yet
ready
> when
> > > we
> > > > > try
> > > > > > > to process the next command. To address this, we need a
better
> > way
> > > to
> > > > > > tell
> > > > > > > when the startup procedure has completed. we can do this
by
> > pushing
> > > > (CC
> > > > > > > informs installer driver when the startup is complete)
or
> polling
> > > > (The
> > > > > > > installer driver needs to actually query the CC for the
state
> of
> > > the
> > > > > > > cluster).
> > > > > > >
> > > > > > > I can do either way so let's vote. My vote goes to the
pushing
> > > > > mechanism.
> > > > > > > Thoughts?
> > > > > > >
> > > > > > > On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <
> > > > > bamousaa@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> This solution turned out to be incorrect. Actually,
the test
> > cases
> > > > > when
> > > > > > I
> > > > > > >> build after using the join method never fails but running
an
> > > actual
> > > > > > asterix
> > > > > > >> instance never succeeds which is quite confusing.
> > > > > > >>
> > > > > > >> I also think that the startup script has a major bug
where it
> > > might
> > > > > > >> returns before the startup is complete. More on this
> later......
> > > > > > >>
> > > > > > >> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi
<
> > > > > bamousaa@gmail.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >>> It is highly unlikely that it is related.
> > > > > > >>>
> > > > > > >>> Cheers,
> > > > > > >>> Abdullah.
> > > > > > >>>
> > > > > > >>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <chenli@gmail.com>
> > > wrote:
> > > > > > >>>
> > > > > > >>>> @Abdullah: Is this issue related to
> > > > > > >>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074?
Ian
> > and I
> > > > > plan
> > > > > > to
> > > > > > >>>> look into the details on Monday.
> > > > > > >>>>
> > > > > > >>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah
alamoudi <
> > > > > > bamousaa@gmail.com
> > > > > > >>>> >
> > > > > > >>>> wrote:
> > > > > > >>>>
> > > > > > >>>> > About 3-4 days ago, I was working on the
addition of the
> > > > > filesystem
> > > > > > >>>> based
> > > > > > >>>> > feed adapter and it didn't take anytime
to complete.
> > However,
> > > > > when I
> > > > > > >>>> wanted
> > > > > > >>>> > to build and make sure all tests pass,
I kept getting
> > > > > > >>>> ConnectionRefused
> > > > > > >>>> > errors which caused the installer tests
to fail every now
> > and
> > > > > then.
> > > > > > >>>> >
> > > > > > >>>> > I knew the new change had nothing to do
with this failure,
> > > yet,
> > > > I
> > > > > > >>>> couldn't
> > > > > > >>>> > direct my attention away from this bug
(It just bothered
> me
> > so
> > > > > much
> > > > > > >>>> and I
> > > > > > >>>> > knew it needs to be resolved ASAP). After
wasting
> countless
> > > > > hours, I
> > > > > > >>>> was
> > > > > > >>>> > finally able to figure out what was happening
:-)
> > > > > > >>>> >
> > > > > > >>>> > In the startup routine, we start three
Jetty web servers
> > (Web
> > > > > > >>>> interface
> > > > > > >>>> > server, JSON API server, and Feed server).
Sometime ago,
> we
> > > used
> > > > > to
> > > > > > >>>> end the
> > > > > > >>>> > startup call before making sure the server.isStarted()
> > method
> > > > > > returns
> > > > > > >>>> true
> > > > > > >>>> > on all servers. At that time, I introduced
the
> > > > > waitUntilServerStarts
> > > > > > >>>> method
> > > > > > >>>> > to make sure we don't return before the
servers are ready.
> > > > Turned
> > > > > > >>>> out, that
> > > > > > >>>> > was an incorrect way to handle this (We
can blame
> > > stackoverflow
> > > > > for
> > > > > > >>>> this
> > > > > > >>>> > one!) and it is not enough that the server
isStarted()
> > returns
> > > > > true.
> > > > > > >>>> The
> > > > > > >>>> > correct way to do this is to call the
server.join() method
> > > after
> > > > > the
> > > > > > >>>> > server.start().
> > > > > > >>>> >
> > > > > > >>>> > See:
> > > > > > >>>> >
> > > > > > >>>>
> > > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
> > > > > > >>>> >
> > > > > > >>>> > This was equally satisfying as it was
frustrating and you
> > are
> > > > > > welcome
> > > > > > >>>> for
> > > > > > >>>> > the future time I saved each of you :)
> > > > > > >>>> > --
> > > > > > >>>> > Amoudi, Abdullah.
> > > > > > >>>> >
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> --
> > > > > > >>> Amoudi, Abdullah.
> > > > > > >>>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> Amoudi, Abdullah.
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Amoudi, Abdullah.
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Amoudi, Abdullah.
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Amoudi, Abdullah.
> > > >
> > >
> >
> >
> >
> > --
> > Amoudi, Abdullah.
> >
>
>
>
> --
> Raman
>



-- 
Amoudi, Abdullah.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message