asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raman Grover <ramangrove...@gmail.com>
Subject Re: managix on Mac
Date Thu, 12 Nov 2015 01:35:51 GMT
Hi,

*How Mangix verifies that a daemon (CC or NC) is alive?*

Each time Managix needs to collect status related to any daemon (CC/NC) on
any node in the cluster, it uses the following mechanism. Note that
scenarios where status information is reported include -
(a) right after creation of an instance
(b) explicit invocation of managix describe command.

On each node (listed in cluster xml), Managix launches a script (verify.sh)
that searches for processes running on the (physical) node that are either
an NC or a CC and correspond to a specified asterix instance identified by
its name. The output of the script is the process Id. Managix collects the
output from each run across the cluster and builds the mapping of what
processes are being reported as running on each node.
The cluster layout specified in cluster xml provides Managix with the
expected processes that should be running  on each node, call this the
expectation, which is compared with the ground truth. The delta is reported
as a warning - CC or NC not running on node X.

*Under what scenarios would the mechanism fail?*
The script verify.sh uses OS tools (ps, grep, cut etc) to extract the java
processes running on a node.
A change in the format of the output would lead to an false negative
wherein even if a process is running, its information is not captured by
the script.

*Possible Solution*
A possible solution is to fix the script verify.sh which is showing signs
of incompatibility with certain environments including vagrant.

An alternate solution is to make use of the existing zookeeper instance in
AsterixDB runtime environment.
Here, we maintain a parent znodes under which processes (CC and NCs) create
a child znode by their name (CC or particular NC ID). Managix can query the
zookeeper to refresh its information on what nodes are alive.

or if there are better, more robust way of determining a cluster wide
status, please share.

Regards,
Raman


On Mon, Nov 9, 2015 at 7:41 AM, Ian Maxon <imaxon@uci.edu> wrote:

> Hm. This issue has been around for a very long time (years) but I
> hadn't ever seen it happen outside of vagrant.
>
> On Thu, Nov 5, 2015 at 4:22 PM, Yingyi Bu <buyingyi@gmail.com> wrote:
> > Just tried "ssh localhost ls -l" --- it just does the listing.
> >
> > Best,
> > Yingyi
> >
> > On Thu, Nov 5, 2015 at 3:30 PM, Chris Hillery <chillery@hillery.land>
> wrote:
> >
> >> When you ssh in by hand (to the account that managix is using), is there
> >> any output from commands in your .profile / .login / etc? That can throw
> >> off the scripting. Trying running "ssh <host> ls -l" and see if you see
> >> anything other than the directory listing.
> >>
> >> Ceej
> >> aka Chris Hillery
> >>
> >> On Thu, Nov 5, 2015 at 9:50 AM, Yingyi Bu <buyingyi@gmail.com> wrote:
> >>
> >> > Ok, thanks!
> >> > Does anyone have an idea how to fix it?
> >> >
> >> > Best,
> >> > Yingyi
> >> >
> >> > On Thu, Nov 5, 2015 at 9:48 AM, Pouria Pirzadeh <
> >> pouria.pirzadeh@gmail.com
> >> > >
> >> > wrote:
> >> >
> >> > > Exactly !
> >> > > I have to kill cc manually.
> >> > > I have encountered this problem only with my local instance on Mac.
> >> > > I do not see it on cluster with Cent-OS.
> >> > >
> >> > > Pouria
> >> > >
> >> > > On Thu, Nov 5, 2015 at 9:46 AM, Yingyi Bu <buyingyi@gmail.com>
> wrote:
> >> > >
> >> > > > Thanks, Pouria!
> >> > > > Also, when I do "managix stop -n test",  my CC cannot be killed.
>  I
> >> > > always
> >> > > > have to manually kill it.
> >> > > > Do you have similar issues?
> >> > > >
> >> > > > Best,
> >> > > > Yingyi
> >> > > >
> >> > > >
> >> > > > On Thu, Nov 5, 2015 at 9:38 AM, Pouria Pirzadeh <
> >> > > pouria.pirzadeh@gmail.com
> >> > > > >
> >> > > > wrote:
> >> > > >
> >> > > > > I have been experiencing it as well for a while.
> >> > > > > It is not clear as why it thinks the instance is unusable
while
> >> > > processes
> >> > > > > are all up and running.
> >> > > > >
> >> > > > > Pouria
> >> > > > >
> >> > > > > On Thu, Nov 5, 2015 at 9:28 AM, Yingyi Bu <buyingyi@gmail.com>
> >> > wrote:
> >> > > > >
> >> > > > > > I got managix working on my Mac machine (10.10.2),
it can
> start a
> >> > > > cluster
> >> > > > > > correctly.
> >> > > > > > But it complains the cluster controller is not running:
> >> > > > > >
> >> > > > > > yingyi-couchbase:managix yingyi$ managix create -n
test -c
> >> > > > > > clusters/local/local.xml
> >> > > > > >
> >> > > > > > INFO: Name:test
> >> > > > > >
> >> > > > > > Created:Thu Nov 05 09:24:08 PST 2015
> >> > > > > >
> >> > > > > > Web-Url:http://127.0.0.1:19001
> >> > > > > >
> >> > > > > > State:UNUSABLE
> >> > > > > >
> >> > > > > >
> >> > > > > > WARNING!:Cluster Controller not running at master
> >> > > > > >
> >> > > > > >
> >> > > > > > The instance works fine.
> >> > > > > >
> >> > > > > > Does anyone know how to get rid of this warning message?
> >> > > > > >
> >> > > > > >
> >> > > > > > Best,
> >> > > > > >
> >> > > > > > Yingyi
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
>



-- 
Raman

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message