asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raman Grover <ramangrove...@gmail.com>
Subject Re: Newbie issue: asterixdb fails to start
Date Fri, 04 Mar 2016 22:24:02 GMT
Managix attempts to gather information on the daemons (CC/NC) to validate
if the processes started successfully  It does so by extracting the process
Ids. However there is a bug where in the extraction of process ids (based
on grep) fails which leads to a false alarm.

Managix would not be able to shut down the asterix instance as it was
unable to extract the process Id (it does a kill -9 to shut down the
dameons associated with an instance).

The mechanism is vulnerable to the layout of the output for ps command. We
need a more robust way of collecting the process Ids and knowing the status
for the remote processes (CC/NC). An option is to have the NCs and CC
follow cluster membership protocol by registering themselves as znodes with
the existing zookeeper instances. Managix can query the zookeeper to
extract required info for launched processes. Other suggestions are welcome.

Regards,
Raman


On Fri, Mar 4, 2016 at 2:13 PM, Yingyi Bu <buyingyi@gmail.com> wrote:

> I always have this warning on my Mac machine when I do "managix start".
> Everything works except that "managix stop" couldn't kill the CC.
>
> Best,
> Yingyi
>
> On Fri, Mar 4, 2016 at 2:09 PM, Ian Maxon <imaxon@uci.edu> wrote:
>
> > Yes, it might be a false alarm as Wail noted.
> > What is the content of the 3 log files though? Are they just empty?
> > logs/execute.log should show you what exact commands were executed by
> > managix, you can try some of them by hand to see where things may be
> going
> > awry in the startup process.
> >
> >
> > On Fri, Mar 4, 2016 at 5:21 AM, Wail Alkowaileet <wael.y.k@gmail.com>
> > wrote:
> >
> > > The message can misleading.
> > > Can you open http <http://127.0.0.1:19001/>://127.0.0.1:19001
> > > <http://127.0.0.1:19001/> and try some queries.
> > > On Mar 4, 2016 14:05, "Veeral Shah" <veerals@gmail.com> wrote:
> > >
> > > > A newbie issue. Deployed the github master branch.
> > > > After a successful build, I configured asterixdb to run single
> instance
> > > > (using managix)  - as documented on
> > > >
> > >
> >
> https://asterixdb.ics.uci.edu/documentation/install.html#Section1SingleMachineAsterixDBInstallation
> > > >
> > > > But cluster controller fails to start. the logs dont reveal anything
> > abt
> > > > the error though. Appears to be some obvious mistake but in the
> absence
> > > of
> > > > the error messages, finding it tough to triage. I see  3 log files (i
> > > dont
> > > > understand why it starts 2 NCs and a CC when i am running a single
> > > instance
> > > > asterixdb).
> > > >
> > > >
> > > > root@ubuntu205:/home/veerals/work/installer# managix configure
> > > > root@ubuntu205:/home/veerals/work/installer# managix validate
> > > > INFO: Environment [OK]
> > > > INFO: Managix Configuration [OK]
> > > >
> > > > root@ubuntu205:/home/veerals/work/installer#
> $MANAGIX_HOME/bin/managix
> > > > create -n try1 -c
> /home/veerals/work/installer/clusters/local/local.xml
> > > > INFO: Name:try1
> > > > Created:Fri Mar 04 15:22:22 IST 2016
> > > > Web-Url:http://127.0.0.1:19001
> > > > State:UNUSABLE
> > > >
> > > > WARNING!:Cluster Controller not running at master
> > > >
> > > >
> > > > root@ubuntu205:/home/veerals/work/installer# managix describe -n
> try1
> > > > -admin
> > > > INFO: Name:try1
> > > > Created:Fri Mar 04 15:22:22 IST 2016
> > > > Web-Url:http://127.0.0.1:19001
> > > > State:UNUSABLE
> > > >
> > > > WARNING!:Cluster Controller not running at master
> > > >
> > > > Master node:master:127.0.0.1
> > > > nc1:127.0.0.1
> > > > nc2:127.0.0.1
> > > >
> > > > Asterix version:0.8.8-SNAPSHOT
> > > > Metadata Node:nc1
> > > > Processes
> > > > NC at nc1 [ 16781 ]
> > > > NC at nc2 [ 16777 ]
> > > >
> > > > Asterix Configuration
> > > > nc.java.opts                             :-Xmx3096m
> > > > cc.java.opts                             :-Xmx1024m
> > > > max.wait.active.cluster                  :60
> > > > storage.buffercache.pagesize             :131072
> > > > storage.buffercache.size                 :536870912
> > > > storage.buffercache.maxopenfiles         :214748364
> > > > storage.memorycomponent.pagesize         :131072
> > > > storage.memorycomponent.numpages         :256
> > > > storage.metadata.memorycomponent.numpages:64
> > > > storage.memorycomponent.numcomponents    :2
> > > > storage.memorycomponent.globalbudget     :1073741824
> > > > storage.lsm.bloomfilter.falsepositiverate:0.01
> > > > txn.log.buffer.numpages                  :8
> > > > txn.log.buffer.pagesize                  :524288
> > > > txn.log.partitionsize                    :2147483648
> > > > txn.log.checkpoint.lsnthreshold          :67108864
> > > > txn.log.checkpoint.pollfrequency         :120
> > > > txn.log.checkpoint.history               :0
> > > > txn.lock.escalationthreshold             :1000
> > > > txn.lock.shrinktimer                     :5000
> > > > txn.lock.timeout.waitthreshold           :60000
> > > > txn.lock.timeout.sweepthreshold          :10000
> > > > compiler.sortmemory                      :33554432
> > > > compiler.joinmemory                      :33554432
> > > > compiler.framesize                       :131072
> > > > compiler.pregelix.home                   :~/pregelix
> > > > web.port                                 :19001
> > > > api.port                                 :19002
> > > > log.level                                :INFO
> > > > plot.activate                            :false
> > > >
> > > > The log files dont reveal much:
> > > >
> > > > Thanks and regards
> > > > Veeral Shah
> > > >
> > >
> >
>



-- 
Raman

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message