ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Tanner <tanner...@gmail.com>
Subject Re: proper return for status() in a service script?
Date Tue, 24 Nov 2015 21:09:29 GMT
It looks like a case of PEBKAC on my part. I was trying to get away with
using one script for all components and thought I would be smart by calling
the appropriate class by using the filename:

if __name__ == "__main__":

    whoami = getpass.getuser()

    f = open("/tmp/cas_script.log", 'a')
    f.write("user: %s filepath: %s\n" % (whoami, __file__))
    f.close()

    basename = os.path.basename(__file__)
    basename = basename.replace('.py', '')
    basename = basename.lower()

    if basename == "master":
        Master().execute()
    elif basename == "slave":
        Slave().execute()
    elif basename == "client":
        Client().execute()

    else:

        log = open("/tmp/cas_script.log", 'a')
        log.write('Executing no function: %s\n' % basename)
        log.close()


Most calls were logging "Executing no function". After carefully reading
through the log again, I realized it was because I needed to truncate the
'.pyc' extension also. Due to that, no class was ever called and i'm sure
Ambari figured things were okay since no exceptions were raised.

Stupid mistake on my part, but here's the final result:

if __name__ == "__main__":

    whoami = getpass.getuser()

    f = open("/tmp/cas_script.log", 'a')
    f.write("user: %s filepath: %s\n" % (whoami, __file__))
    f.close()

    basename = os.path.basename(__file__)
    basename = basename.replace('.pyc', '')
    basename = basename.replace('.py', '')
    basename = basename.lower()

    if basename == "master":
        Master().execute()
    elif basename == "slave":
        Slave().execute()
    elif basename == "client":
        Client().execute()

    else:

        log = open("/tmp/cas_script.log", 'a')
        log.write('Executing no function: %s\n' % basename)
        log.close()


Sorry for the false alarm.


On Tue, Nov 24, 2015 at 4:00 PM, Nate Cole <ncole@hortonworks.com> wrote:

> Hi James,
>
> Would you mind providing your python script so we can take a look?
>
> Thanks,
> Nate
>
> From: James Tanner <tanner.jc@gmail.com>
> Reply-To: "user@ambari.apache.org" <user@ambari.apache.org>
> Date: Tuesday, November 24, 2015 at 3:51 PM
> To: "user@ambari.apache.org" <user@ambari.apache.org>
> Subject: Re: proper return for status() in a service script?
>
> I killed the test service, restarted ambari-server, then tailed the logs
> to see if there were any clues ...
>
> 24 Nov 2015 15:47:35,132  INFO [qtp-ambari-agent-52] HeartBeatHandler:657
> - State of service component TEST_SLAVE of service TEST of cluster TEST01
> has changed from INSTALLED to STARTED at host node2.lab.net
> 24 Nov 2015 15:47:35,134  INFO [qtp-ambari-agent-52] HeartBeatHandler:657
> - State of service component TEST_CLIENT of service TEST of cluster TEST01
> has changed from INSTALLED to STARTED at host node2.lab.net
> 24 Nov 2015 15:47:37,775  INFO [Thread-23]
> AbstractPoolBackedDataSource:462 - Initializing c3p0 pool...
> com.mchange.v2.c3p0.ComboPooledDataSource [ acquireIncrement -> 3,
> acquireRetryAttempts -> 30, acquireRetryDelay -> 1000, autoCommitOnClose ->
> false, automaticTestTable -> null, breakAfterAcquireFailure -> false,
> checkoutTimeout -> 0, connectionCustomizerClassName -> null,
> connectionTesterClassName ->
> com.mchange.v2.c3p0.impl.DefaultConnectionTester, dataSourceName ->
> 2rvxuc9dgfrzar1v9eztj|7320bccc, debugUnreturnedConnectionStackTraces ->
> false, description -> null, driverClass -> org.postgresql.Driver,
> factoryClassLocation -> null, forceIgnoreUnresolvedTransactions -> false,
> identityToken -> 2rvxuc9dgfrzar1v9eztj|7320bccc, idleConnectionTestPeriod
> -> 50, initialPoolSize -> 3, jdbcUrl -> jdbc:postgresql://localhost/ambari,
> lastAcquisitionFailureDefaultUser -> null, maxAdministrativeTaskTime -> 0,
> maxConnectionAge -> 0, maxIdleTime -> 0, maxIdleTimeExcessConnections -> 0,
> maxPoolSize -> 5, maxStatements -> 0, maxStatementsPerConnection -> 120,
> minPoolSize -> 1, numHelperThreads -> 3,
> numThreadsAwaitingCheckoutDefaultUser -> 0, preferredTestQuery -> select 0,
> properties -> {user=******, password=******}, propertyCycle -> 0,
> testConnectionOnCheckin -> true, testConnectionOnCheckout -> false,
> unreturnedConnectionTimeout -> 0, usesTraditionalReflectiveProxies -> false
> ]
> 24 Nov 2015 15:47:37,988  INFO [Thread-23] JobStoreTX:861 - Freed 0
> triggers from 'acquired' / 'blocked' state.
> 24 Nov 2015 15:47:38,014  INFO [Thread-23] JobStoreTX:871 - Recovering 0
> jobs that were in-progress at the time of the last shut-down.
> 24 Nov 2015 15:47:38,014  INFO [Thread-23] JobStoreTX:884 - Recovery
> complete.
> 24 Nov 2015 15:47:38,014  INFO [Thread-23] JobStoreTX:891 - Removed 0
> 'complete' triggers.
> 24 Nov 2015 15:47:38,015  INFO [Thread-23] JobStoreTX:896 - Removed 0
> stale fired job entries.
> 24 Nov 2015 15:47:38,031  INFO [Thread-23] QuartzScheduler:575 - Scheduler
> ExecutionScheduler_$_NON_CLUSTERED started.
> 24 Nov 2015 15:47:38,723  INFO [qtp-ambari-agent-39] HeartBeatHandler:657
> - State of service component TEST_CLIENT of service TEST of cluster TEST01
> has changed from INSTALLED to STARTED at host node1.lab.net
> 24 Nov 2015 15:47:38,729  INFO [qtp-ambari-agent-39] HeartBeatHandler:657
> - State of service component TEST_MASTER of service TEST of cluster CAS01
> has changed from INSTALLED to STARTED at host node1.lab.net
>
>
> Ambari flipped the state from "INSTALLED" to "STARTED", but I can tell
> from my service script's log output that no calls were ever made to it,
> especially not a call to status(). What is ambari actually doing when it
> decides to switch state from installed to started? It seems to be unrelated
> to the service script(s).
>
> On Tue, Nov 24, 2015 at 3:08 PM, James Tanner <tanner.jc@gmail.com> wrote:
>
>> What is the proper return values for "running" and "not running" in an
>> Ambari service script?
>>
>> If I reference the wiki, the status function should return nothing:
>>
>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=38571133#Overview%28Ambari1.5.0orlater%29-CreateandAddtheService.1
>>
>> If I reference the GlusterFS's yarn service script included in an HDP
>> stack, there is no return but a ComponentIsNotRunning should be rasied if
>> it's down.
>>
>>
>> Regardless of what I return, it seems that the internal ambari database
>> status gets set to "running".
>>
>> ambari=# select component_name,current_state from
>> ambari.hostcomponentstate;
>>   component_name   | current_state
>> -------------------+---------------
>>  ZOOKEEPER_SERVER  | STARTED
>>  ZOOKEEPER_CLIENT  | INSTALLED
>>  ZOOKEEPER_CLIENT  | INSTALLED
>>  TEST_CLIENT        | STARTED
>>  TEST_CLIENT        | STARTED
>>  METRICS_MONITOR   | STARTED
>>  METRICS_COLLECTOR | STARTED
>>  ZOOKEEPER_SERVER  | STARTED
>>  TEST_SLAVE         | STARTED                 # the service script raised
>> the ComponentIsNotRunning exception for this when status() was called
>>  METRICS_MONITOR   | STARTED
>>  METRICS_COLLECTOR | STARTED
>>  TEST_MASTER        | STARTED               # the service script raised
>> the ComponentIsNotRunning exception for this when status() was called
>> (12 rows)
>>
>>
>>
>> I've also noticed via log statements that the status() function for the
>> service is called upon startup of ambari-server or during manual service
>> state change, but it never polls status at any regular interval. Is that
>> supposed to be the case? If not, how is the displayed service state ever
>> accurate?
>>
>
>

Mime
View raw message