openwhisk-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tyson Norris <tnor...@adobe.com.INVALID>
Subject Re: Error when deploying Whisk-Controller in dcos
Date Mon, 05 Mar 2018 23:22:26 GMT
I’m guessing the kafka service did not start properly? Can you verify the kafka service is
usable?

> On Mar 5, 2018, at 3:00 PM, Kumar Subramanian <kumarsubrama@vmware.com> wrote:
> 
> After I increased the timeout on the health checks I get the following
> [2018-03-05T22:58:34.042Z] [ERROR] [??] [KafkaMessagingProvider] ensureTopic for completed0
failed due to java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException:
Timed out waiting for a node assignment.
> [2018-03-05T22:58:34.044Z] [ERROR] [??] [Controller] failure during msgProvider.ensureTopic
for topic completed0
> [INFO] [03/05/2018 22:58:34.070] [kamon-shutdown-hook-1] [CoordinatedShutdown(akka://kamon)]
Starting coordinated shutdown from JVM shutdown hook
> 
> On 3/5/18, 2:58 PM, "Kumar Subramanian" <kumarsubrama@vmware.com> wrote:
> 
>    Hi Carlos/Tyson/Chetan,
>    Any Suggestions?
> 
>    Note: I just tried to increase the timeout on the health checks …no luck.
> 
>    Thanks,
>    Kumar. 
> 
>    On 3/5/18, 12:28 PM, "Kumar Subramanian" <kumarsubrama@vmware.com> wrote:
> 
>        This is the output log
>        [2018-03-05T20:24:10.016Z] [INFO] Initializing Kamon...
>        [INFO] [03/05/2018 20:24:10.301] [main] [StatsDExtension(akka://kamon)] Starting
the Kamon(StatsD) extension
>        [2018-03-05T20:24:10.352Z] [INFO] Slf4jLogger started
>        [2018-03-05T20:24:10.706Z] [INFO] [??] [Config] environment set value for db.whisk.actions
>        [2018-03-05T20:24:10.707Z] [INFO] [??] [Config] environment set value for db.protocol
>        [2018-03-05T20:24:10.708Z] [INFO] [??] [Config] environment set value for limits.actions.sequence.maxLength
>        [2018-03-05T20:24:10.708Z] [INFO] [??] [Config] environment set value for limits.triggers.fires.perMinute
>        [2018-03-05T20:24:10.708Z] [INFO] [??] [Config] environment set value for akka.cluster.seed.nodes
>        [2018-03-05T20:24:10.709Z] [INFO] [??] [Config] environment set value for limits.actions.invokes.concurrent
>        [2018-03-05T20:24:10.709Z] [INFO] [??] [Config] environment set value for controller.instances
>        [2018-03-05T20:24:10.710Z] [INFO] [??] [Config] environment set value for controller.localBookkeeping
>        [2018-03-05T20:24:10.710Z] [INFO] [??] [Config] environment set value for whisk.version.date
>        [2018-03-05T20:24:10.710Z] [INFO] [??] [Config] environment set value for db.port
>        [2018-03-05T20:24:10.711Z] [INFO] [??] [Config] environment set value for whisk.version.buildno
>        [2018-03-05T20:24:10.711Z] [INFO] [??] [Config] environment set value for db.whisk.activations
>        [2018-03-05T20:24:10.711Z] [INFO] [??] [Config] environment set value for db.username
>        [2018-03-05T20:24:10.712Z] [INFO] [??] [Config] environment set value for limits.actions.invokes.perMinute
>        [2018-03-05T20:24:10.712Z] [INFO] [??] [Config] environment set value for db.whisk.auths
>        [2018-03-05T20:24:10.712Z] [INFO] [??] [Config] environment set value for limits.actions.invokes.concurrentInSystem
>        [2018-03-05T20:24:10.712Z] [INFO] [??] [Config] environment set value for runtimes.manifest
>        [2018-03-05T20:24:10.713Z] [INFO] [??] [Config] environment set value for kafka.hosts
>        [2018-03-05T20:24:10.713Z] [INFO] [??] [Config] environment set value for db.host
>        [2018-03-05T20:24:10.713Z] [INFO] [??] [Config] environment set value for port
>        [2018-03-05T20:24:10.714Z] [INFO] [??] [Config] environment set value for db.password
>        [2018-03-05T20:24:10.714Z] [INFO] [??] [Config] environment set value for db.provider
>        Received killTask for task whisk-controller.28d25d91-20b3-11e8-8754-3afdc003616b
>        [INFO] [03/05/2018 20:25:38.974] [kamon-shutdown-hook-1] [CoordinatedShutdown(akka://kamon)]
Starting coordinated shutdown from JVM shutdown hook
>        [2018-03-05T20:25:38.975Z] [INFO] Starting coordinated shutdown from JVM shutdown
hook
>        [2018-03-05T20:25:38.982Z] [INFO] [??] [Controller] Shutting down Kamon with coordinated
shutdown
> 
>        ERROR_LOG
>        I0305 20:24:09.220711  1337 exec.cpp:162] Version: 1.2.3
>        I0305 20:24:09.227144  1338 exec.cpp:237] Executor registered on agent 995020e0-5129-44a3-8cf4-65900838b3af-S7
>        W0305 20:24:09.227144  1341 logging.cpp:91] RAW: Received signal SIGTERM from
process 10243 of user 0; exiting
> 
>        On 3/5/18, 12:24 PM, "Kumar Subramanian" <kumarsubrama@vmware.com> wrote:
> 
>            I get the following error while in the deploying state (then kills it automatically
and re-installs and goes on…)
>            Task was killed since health check failed. Reason: ConnectionAttemptFailedException:
Connection attempt to <whisk_controller_ip>>:8888 failed
> 
> 
> 
>            On 3/5/18, 12:21 PM, "Kumar Subramanian" <kumarsubrama@vmware.com> wrote:
> 
>                Gave the value as mykafka.marathon.mesos:9092…it seems to be going forward
now with the deployment…hope it succeeds
> 
>                On 3/5/18, 12:13 PM, "Kumar Subramanian" <kumarsubrama@vmware.com>
wrote:
> 
>                    Hi Chetan,
>                    I resolved all the settings (not sure about some those values set)….now
I’m getting the following error
>                    I0305 20:07:00.341622 21216 exec.cpp:162] Version: 1.2.3
>                    I0305 20:07:00.347102 21224 exec.cpp:237] Executor registered on agent
995020e0-5129-44a3-8cf4-65900838b3af-S4
>                    Exception in thread "main" org.apache.kafka.common.KafkaException:
Failed create new KafkaAdminClient
>                    	at org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:322)
>                    	at org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:50)
>                    	at whisk.connector.kafka.KafkaMessagingProvider$.ensureTopic(KafkaMessagingProvider.scala:70)
>                    	at whisk.core.controller.Controller$.main(Controller.scala:217)
>                    	at whisk.core.controller.Controller.main(Controller.scala)
>                    Caused by: org.apache.kafka.common.config.ConfigException: No resolvable
bootstrap urls given in bootstrap.servers
>                    	at org.apache.kafka.clients.ClientUtils.parseAndValidateAddresses(ClientUtils.java:64)
>                    	at org.apache.kafka.clients.admin.KafkaAdminClient.<init>(KafkaAdminClient.java:345)
>                    	at org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:315)
>                    	... 4 more
>                    <<WHISK_CONTROLLER ENVIRONMENT SETTINGS>>
>                    The config environment values are:
>                    "KAFKA_HOST": "broker-0.kafka.mesos"
>                    "KAFKA_HOSTS": "mykafka.docker:9092"
> 
> 
>                    Here is the Kakfka service in my dcos services
>                    <<KAFKA_SERVICE>>	
>                    ID: mykafka.654fa8cd-1e56-11e8-8754-3afdc003616b
>                    Name: mykafka
>                    Address: <<internal_ip>>	
>                    Status: Running
> 
>                    On 3/5/18, 12:01 PM, "Kumar Subramanian" <kumarsubrama@vmware.com>
wrote:
> 
>                        Also what is DB_WHISK_ACTIVATIONS=local_activations, what does
it mean by “local” activations? This setting is also needed, what should the value be
in my dcos env? Should it still be local_activations?
> 
>                        On 3/5/18, 11:48 AM, "Kumar Subramanian" <kumarsubrama@vmware.com>
wrote:
> 
>                            Thanks Chetan, I added that now I’m getting the 
> 
>                            [2018-03-05T19:43:45.025Z] [ERROR] [??] [Config] required
property kafka.hosts still not set
> 
>                            what is kafka.hosts is that the kafka host name? (as in mykafka…that;s
the kafka name I gave when I installed kakfa). Should it be just the name or is FQDN ?
> 
> 
>                            On 3/5/18, 11:32 AM, "Chetan Mehrotra" <chetan.mehrotra@gmail.com>
wrote:
> 
>> required property controller.instances still not set
> 
>                                Looks like some configs are missing. You would need to
this or few
>                                more props. The configs are generally managed via Ansible
for default
>                                setup. For dcos you may need to configure them explicitly.
You can see
>                                various configs and there values as an example at [1]
>                                (controller.instances becomes CONTROLLER_INSTANCES). They
would need
>                                to be tweaked as per your setup though
> 
>                                Chetan Mehrotra
>                                [1] https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__github.com_apache_incubator-2Dopenwhisk-2Ddevtools_blob_master_docker-2Dcompose_docker-2Dwhisk-2Dcontroller.env%26d%3DDwIFaQ%26c%3DuilaK90D4TOVoH58JNXRgQ%26r%3DF5C8fYlpBJ270qrdwLq2iRQrPd1CLap8zItxk8laWpo%26m%3DK8Rzl5BrVqOWFx5b1fYG8EjdY-6JVyi-x_eMD0thaKY%26s%3DVfbixlUG4tFbgBNxiMr-KCNJOakmlGuNcnXLlbQMrVY%26e&data=02%7C01%7Ctnorris%40adobe.com%7Cb6c350b230924b71a74608d582ecfe6d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636558876747512963&sdata=0Ay%2FfOtiAyIPC6qqjTIUfoAnXT8%2BBQ11zTQKOgLtLM4%3D&reserved=0=
> 
> 
>                                On Tue, Mar 6, 2018 at 12:09 AM, Kumar Subramanian
>                                <kumarsubrama@vmware.com> wrote:
>> Hi,
>> I was able to successfully do the following
>> 1) Build the Controller image
>> 2) Push the image
>> 
>> However when I installed the controller package it gives me the following error in
the output; then it shuts down and retries the installation (goes on…)
>> 
>> Registered docker executor on 10.0.6.13
>> Starting task whisk-controller.a46ae612-20a2-11e8-8754-3afdc003616b
>> [2018-03-05T18:25:55.853Z] [INFO] Initializing Kamon...
>> [INFO] [03/05/2018 18:25:56.151] [main] [StatsDExtension(akka://kamon)] Starting
the Kamon(StatsD) extension
>> [2018-03-05T18:25:56.193Z] [INFO] Slf4jLogger started
>> [2018-03-05T18:25:56.552Z] [INFO] [??] [Config] environment set value for db.whisk.actions
>> [2018-03-05T18:25:56.554Z] [INFO] [??] [Config] environment set value for db.protocol
>> [2018-03-05T18:25:56.554Z] [INFO] [??] [Config] environment set value for limits.triggers.fires.perMinute
>> [2018-03-05T18:25:56.554Z] [INFO] [??] [Config] environment set value for limits.actions.invokes.concurrent
>> [2018-03-05T18:25:56.555Z] [INFO] [??] [Config] environment set value for whisk.version.date
>> [2018-03-05T18:25:56.555Z] [INFO] [??] [Config] environment set value for db.port
>> [2018-03-05T18:25:56.555Z] [INFO] [??] [Config] environment set value for whisk.version.buildno
>> [2018-03-05T18:25:56.556Z] [INFO] [??] [Config] environment set value for db.username
>> [2018-03-05T18:25:56.556Z] [INFO] [??] [Config] environment set value for limits.actions.invokes.perMinute
>> [2018-03-05T18:25:56.556Z] [INFO] [??] [Config] environment set value for db.whisk.auths
>> [2018-03-05T18:25:56.556Z] [INFO] [??] [Config] environment set value for limits.actions.invokes.concurrentInSystem
>> [2018-03-05T18:25:56.557Z] [INFO] [??] [Config] environment set value for runtimes.manifest
>> [2018-03-05T18:25:56.557Z] [INFO] [??] [Config] environment set value for db.host
>> [2018-03-05T18:25:56.558Z] [INFO] [??] [Config] environment set value for port
>> [2018-03-05T18:25:56.558Z] [INFO] [??] [Config] environment set value for db.password
>> [2018-03-05T18:25:56.558Z] [INFO] [??] [Config] environment set value for db.provider
>> [2018-03-05T18:25:56.561Z] [ERROR] [??] [Config] required property controller.instances
still not set
>> [2018-03-05T18:25:56.561Z] [ERROR] [??] [Controller] Bad configuration, cannot start.
>> 
>> Any suggestions?
>> 
>> 
>> On 3/2/18, 4:05 PM, "Kumar Subramanian" <kumarsubrama@vmware.com> wrote:
>> 
>>    This is the error I get when I did docker build (for Controller)
>> 
>>    Step 1/7 : FROM scala
>>    repository scala not found: does not exist or no pull access
>> 
>> 
>>    Any Suggestions?
>> 
>>    On 3/2/18, 3:34 PM, "Carlos Santana" <csantana23@gmail.com> wrote:
>> 
>>        No that it’s still in PR
>> 
>>        Just pull the changes locally and build
>> 
>>        - Carlos Santana
>>        @csantanapr
>> 
>>> On Mar 2, 2018, at 6:20 PM, Kumar Subramanian <kumarsubrama@vmware.com>
wrote:
>>> 
>>> Is that change at https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__github.com_chetanmeh_incubator-2Dopenwhisk_blob_fa302249f4f9b4e6b3084956f18bda987674f46f_core_controller_Dockerfile%26d%3DDwIFaQ%26c%3DuilaK90D4TOVoH58JNXRgQ%26r%3DF5C8fYlpBJ270qrdwLq2iRQrPd1CLap8zItxk8laWpo%26m%3DnLubLAFijdQ4pOPqIydDI_wguMgbdmdmoMXcP7g-m8k%26s%3D0_zv4jTDip5Uk9oBB5-6Ka_Iug3KYWIhy7qzSDryqM0%26e&data=02%7C01%7Ctnorris%40adobe.com%7Cb6c350b230924b71a74608d582ecfe6d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636558876747512963&sdata=TSMtdsDvFmEZNOwcKHPk%2BmIMse4pgHnt3nlF4uL%2BqyY%3D&reserved=0=
not merged? I don’t see change in master https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__github.com_apache_incubator-2Dopenwhisk_blob_master_core_controller_Dockerfile%26d%3DDwIFaQ%26c%3DuilaK90D4TOVoH58JNXRgQ%26r%3DF5C8fYlpBJ270qrdwLq2iRQrPd1CLap8zItxk8laWpo%26m%3DnLubLAFijdQ4pOPqIydDI_wguMgbdmdmoMXcP7g-m8k%26s%3DTdB-IjihM3-0dqBF029dSMkoWbZHBAUXXqeDjfQMlhg%26e&data=02%7C01%7Ctnorris%40adobe.com%7Cb6c350b230924b71a74608d582ecfe6d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636558876747512963&sdata=%2BYiogEyVeaOaF8brNypAf7NZSfhoK77Q5yFG0261O3k%3D&reserved=0=
>>> 
>>> Thanks,
>>> Kumar.
>>> 
>>> On 3/2/18, 2:59 PM, "Kumar Subramanian" <kumarsubrama@vmware.com> wrote:
>>> 
>>>   Ok, I will try to build the controller image and see. Will keep you posted.
>>> 
>>>   On 3/2/18, 2:39 PM, "Tyson Norris" <tnorris@adobe.com.INVALID> wrote:
>>> 
>>>       Thanks Carlos - I think you’re right.
>>> 
>>> 
>>> 
>>>       Kumar you can either build the controller image with that PR, or else you
should be able to manually set the docker cmd, e.g. /bin/sh -c \"exec /init.sh 0 >>
/dev/stdout\” on the dcos service for controller;
>>> 
>>> 
>>> 
>>>       I think you will have similar issue with invoker, mostly because this universe
is far out of date from current openwhisk images.
>>> 
>>> 
>>> 
>>>       For invoker can you use the docker cmd as /bin/sh -c \"exec /init.sh --name
$LIBPROCESS_IP >> /dev/stdout\”
>>> 
>>> 
>>> 
>>>       Additionally, the env vars (both invoker and controller) have changed substantially,
so I would expect a few hiccups there as well.
>>> 
>>> 
>>> 
>>>       We are working on getting updates to the universe so that our internal
deployment details are not included, and it will actually work with recent openwhisk images
(and stay working) but haven’t gotten everything set just yet.
>>> 
>>> 
>>> 
>>>       Hope that helps
>>> 
>>>       Tyson
>>> 
>>> 
>>> 
>>>>> On Mar 2, 2018, at 2:21 PM, Carlos Santana <csantana23@gmail.com>
wrote:
>>>> 
>>>> 
>>> 
>>>> Maybe for the init.sh this PR is related
>>> 
>>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fgithub.com-252Fapache-252Fincubator-2Dopenwhisk-252Fpull-252F3374-252Ffiles-2523diff-2D8f445fbdf6253dd176975ff6c629def4R18-26data-3D02-257C01-257Ctnorris-2540adobe.com-257C605cc50c7a484ccb833708d5808bfe64-257Cfa7b1b5a7b34438794aed2c178decee1-257C0-257C0-257C636556261107863657-26sdata-3Dtk8Des10hubLs7FNSgzDlsk1ibxDTIqSlXti-252FcAUyz0-253D-26reserved-3D0%26d%3DDwIGaQ%26c%3DuilaK90D4TOVoH58JNXRgQ%26r%3DF5C8fYlpBJ270qrdwLq2iRQrPd1CLap8zItxk8laWpo%26m%3DLUthdew4Dt10vSAZSYRBbREqgwWk2PUWc4KDBJtt0uU%26s%3D3UqWljTQjItMnzWhPrsfD1AF2IX6abtc9dYRfrxb2_M%26e&data=02%7C01%7Ctnorris%40adobe.com%7Cb6c350b230924b71a74608d582ecfe6d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636558876747512963&sdata=O8hVxmtrLr5SL%2B2hnc25H0efls7esb5FER8HxWxR6RU%3D&reserved=0=
>>> 
>>>> 
>>> 
>>>> 
>>> 
>>>> On Fri, Mar 2, 2018 at 5:09 PM Tyson Norris <tnorris@adobe.com.invalid>
>>> 
>>>> wrote:
>>> 
>>>> 
>>> 
>>>>> Check your marathon/dcos service config to verify what image is used,
and
>>> 
>>>>> that you have the latest image pulled?
>>> 
>>>>> 
>>> 
>>>>> The default should be openwhisk/controller - but I see that universe
>>> 
>>>>> package marathon config is not set to force pull, so if you are using
that
>>> 
>>>>> image, make sure you have pulled the latest manually (or change the config
>>> 
>>>>> to force pull in dcos/marathon ui).
>>> 
>>>>> 
>>> 
>>>>> Tyson
>>> 
>>>>> 
>>> 
>>>>>> On Mar 2, 2018, at 12:45 PM, Kumar Subramanian <kumarsubrama@vmware.com>
>>> 
>>>>> wrote:
>>> 
>>>>>> 
>>> 
>>>>>> Hi,
>>> 
>>>>>> I have installed the following in DCOS successfully:
>>> 
>>>>>> 1. Apigateway
>>> 
>>>>>> 2. Exhibitor-dcos
>>> 
>>>>>> 3. Kafka (name given is mykafka at the time of installation)
>>> 
>>>>>> 4. Whisk-couchdb
>>> 
>>>>>> 5. Consul
>>> 
>>>>>> 6. Registrator
>>> 
>>>>>> 
>>> 
>>>>>> Eror when deploying Whisk-Controller in DCOS:
>>> 
>>>>>> When I tried to deploy whisk-controller with default settings, then
the
>>> 
>>>>> service fails to deploy (it just kills and redploys the service
>>> 
>>>>> continuously on its own when deploying)
>>> 
>>>>>> 
>>> 
>>>>>> Here is the content in the Error and Output
>>> 
>>>>>> 
>>> 
>>>>>> STDERR:
>>> 
>>>>>> (AT BEGINNING OF FILE)
>>> 
>>>>>> I0302 20:38:35.176177 19822 exec.cpp:162] Version: 1.2.3
>>> 
>>>>>> I0302 20:38:35.180703 19824 exec.cpp:237] Executor registered on
agent
>>> 
>>>>> 995020e0-5129-44a3-8cf4-65900838b3af-S6
>>> 
>>>>>> docker: Error response from daemon: Container command 'init.sh' not
>>> 
>>>>> found or does not exist..
>>> 
>>>>>> 
>>> 
>>>>>> OUTPUT:
>>> 
>>>>>> (AT BEGINNING OF FILE)
>>> 
>>>>>> Registered docker executor on 10.0.6.55
>>> 
>>>>>> Starting task whisk-controller.adb62c44-1e59-11e8-8754-3afdc003616b
>>> 
>>>>>> 
>>> 
>>>>>> Can you please provide your valuable inputs on how to get
>>> 
>>>>> whisk-controller deployed in dcos?
>>> 
>>>>>> 
>>> 
>>>>>> Thanks,
>>> 
>>>>>> Kumar.
>>> 
>>>>>> 
>>> 
>>>>> 
>>> 
>>>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 

Mime
View raw message