flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: Flink on Kubernetes (Minikube)
Date Wed, 19 Dec 2018 15:55:57 GMT
Hi Alexandru,

minikube ssh 'sudo ip link set docker0 promisc on' is not supposed to solve
the problem you are seeing. It only resolves the problem if the JobMaster
wants to reach itself through the jobmanager-service name. Your problem
seems to be something else. Could you check if jobmanager-service resolves
on a pod by sshing into it and pinging this address?

Cheers,
Till

On Wed, Dec 19, 2018 at 4:08 PM Alexandru Gutan <alex.gutan.m@gmail.com>
wrote:

> Got it working on the Google Cloud Platform Kubernetes service...
> More support for Minikube is needed.
>
> On Wed, 19 Dec 2018 at 13:44, Alexandru Gutan <alex.gutan.m@gmail.com>
> wrote:
>
>> I've found this in the archives:
>> http://mail-archives.apache.org/mod_mbox/flink-dev/201804.mbox/%3CCALbFKXr=rp9TYpD_JA8vmuWbcjY0+Lp2mbr4Y=0FNh316HZABQ@mail.gmail.com%3E
>>
>> And as suggested I tried a different startup order but unsuccessful:
>>
>> kubectl create -f jobmanager-deployment.yaml
>> kubectl create -f jobmanager-service.yaml
>> kubectl create -f taskmanager-deployment.yaml
>>
>> I get the same error *java.net.UnknownHostException: flink-jobmanager: Temporary
failure in name resolution*
>>
>>
>> On Wed, 19 Dec 2018 at 13:27, Dawid Wysakowicz <dwysakowicz@apache.org>
>> wrote:
>>
>>> Hi Alexandru,
>>>
>>> This sounds reasonable that it might be because of this minikube command
>>> failed, but I am not a kubernetes expert. I cc Till who knows more on this.
>>>
>>> Best,
>>>
>>> Dawid
>>> On 19/12/2018 14:16, Alexandru Gutan wrote:
>>>
>>> Thanks!
>>> I'm using now the *flink:1.7.0-hadoop24-scala_2.12* image.
>>> The Hadoop related error is gone, but I have a new error:
>>>
>>> Starting Task Manager
>>> config file:
>>> jobmanager.rpc.address: flink-jobmanager
>>> jobmanager.rpc.port: 6123
>>> jobmanager.heap.size: 1024m
>>> taskmanager.heap.size: 1024m
>>> taskmanager.numberOfTaskSlots: 2
>>> parallelism.default: 1
>>> rest.port: 8081
>>> blob.server.port: 6124
>>> query.server.port: 6125
>>> Starting taskexecutor as a console application on host
>>> flink-taskmanager-54b679f8bb-22b4r.
>>> 2018-12-19 13:09:38,469 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -
>>> --------------------------------------------------------------------------------
>>> 2018-12-19 13:09:38,470 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Starting
>>> TaskManager (Version: 1.7.0, Rev:49da9f9, Date:28.11.2018 @ 17:59:06 UTC)
>>> 2018-12-19 13:09:38,470 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  OS current
>>> user: flink
>>> 2018-12-19 13:09:38,921 WARN
>>> org.apache.hadoop.util.NativeCodeLoader                       - Unable to
>>> load native-hadoop library for your platform... using builtin-java classes
>>> where applicable
>>> 2018-12-19 13:09:39,307 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Current
>>> Hadoop/Kerberos user: flink
>>> 2018-12-19 13:09:39,307 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JVM:
>>> OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13
>>> 2018-12-19 13:09:39,307 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Maximum
>>> heap size: 922 MiBytes
>>> 2018-12-19 13:09:39,307 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JAVA_HOME:
>>> /docker-java-home/jre
>>> 2018-12-19 13:09:39,318 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Hadoop
>>> version: 2.4.1
>>> 2018-12-19 13:09:39,318 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JVM
>>> Options:
>>> 2018-12-19 13:09:39,319 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -
>>> -XX:+UseG1GC
>>> 2018-12-19 13:09:39,319 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -Xms922M
>>> 2018-12-19 13:09:39,320 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -Xmx922M
>>> 2018-12-19 13:09:39,320 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -
>>> -XX:MaxDirectMemorySize=8388607T
>>> 2018-12-19 13:09:39,320 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -
>>> -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
>>> 2018-12-19 13:09:39,320 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -
>>> -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
>>> 2018-12-19 13:09:39,320 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Program
>>> Arguments:
>>> 2018-12-19 13:09:39,321 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -
>>> --configDir
>>> 2018-12-19 13:09:39,321 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -
>>> /opt/flink/conf
>>> 2018-12-19 13:09:39,321 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Classpath:
>>> /opt/flink/lib/flink-python_2.12-1.7.0.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.7.0.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.15.jar:/opt/flink/lib/flink-dist_2.12-1.7.0.jar:::
>>> 2018-12-19 13:09:39,321 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -
>>> --------------------------------------------------------------------------------
>>> 2018-12-19 13:09:39,323 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - Registered
>>> UNIX signal handlers for [TERM, HUP, INT]
>>> 2018-12-19 13:09:39,329 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - Maximum
>>> number of open file descriptors is 1048576.
>>> 2018-12-19 13:09:39,366 INFO
>>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>>> configuration property: jobmanager.rpc.address, flink-jobmanager
>>> 2018-12-19 13:09:39,367 INFO
>>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>>> configuration property: jobmanager.rpc.port, 6123
>>> 2018-12-19 13:09:39,367 INFO
>>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>>> configuration property: jobmanager.heap.size, 1024m
>>> 2018-12-19 13:09:39,367 INFO
>>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>>> configuration property: taskmanager.heap.size, 1024m
>>> 2018-12-19 13:09:39,368 INFO
>>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>>> configuration property: taskmanager.numberOfTaskSlots, 2
>>> 2018-12-19 13:09:39,369 INFO
>>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>>> configuration property: parallelism.default, 1
>>> 2018-12-19 13:09:39,370 INFO
>>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>>> configuration property: rest.port, 8081
>>> 2018-12-19 13:09:39,372 INFO
>>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>>> configuration property: blob.server.port, 6124
>>> 2018-12-19 13:09:39,374 INFO
>>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>>> configuration property: query.server.port, 6125
>>> 2018-12-19 13:09:39,511 INFO
>>> org.apache.flink.runtime.security.modules.HadoopModule        - Hadoop user
>>> set to flink (auth:SIMPLE)
>>> 2018-12-19 13:10:00,708 ERROR
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - TaskManager
>>> initialization failed.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *java.net.UnknownHostException: flink-jobmanager: Temporary failure in
>>> name resolution     at java.net.Inet4AddressImpl.lookupAllHostAddr(Native
>>> Method)     at
>>> java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)     at
>>> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
>>> at java.net.InetAddress.getAllByName0(InetAddress.java:1277)     at
>>> java.net.InetAddress.getAllByName(InetAddress.java:1193)     at
>>> java.net.InetAddress.getAllByName(InetAddress.java:1127)     at
>>> java.net.InetAddress.getByName(InetAddress.java:1077)     at
>>> org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils.getRpcUrl(AkkaRpcServiceUtils.java:167)
>>>     at
>>> org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils.getRpcUrl(AkkaRpcServiceUtils.java:133)
>>>     at
>>> org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:89)
>>>     at
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.<init>(TaskManagerRunner.java:127)
>>>     at
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManager(TaskManagerRunner.java:330)
>>>     at
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner$1.call(TaskManagerRunner.java:301)
>>>     at
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner$1.call(TaskManagerRunner.java:298)
>>>     at java.security.AccessController.doPrivileged(Native Method)     at
>>> javax.security.auth.Subject.doAs(Subject.java:422)     at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
>>>     at
>>> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>>>     at
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.main(TaskManagerRunner.java:298)*
>>>
>>> Is the minikube ssh... command supposed to mitigate this? (*minikube
>>> ssh 'sudo ip link set docker0 promisc on')*
>>> Maybe the minikube ssh.. command didn't get executed properly? Is there
>>> a way to check if it had executed correctly?
>>> Or is it another type of issue?
>>>
>>>
>>> *Thank you! *
>>>
>>> On Wed, 19 Dec 2018 at 12:12, Dawid Wysakowicz <dwysakowicz@apache.org>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> You used a hadoopless docker image, therefore it cannot find hadoop
>>>> dependencies. It is ok if you don't need to use any, the bolded messages
>>>> are just INFO, those are not errors.
>>>>
>>>> Best,
>>>>
>>>> Dawid
>>>> On 19/12/2018 12:58, Alexandru Gutan wrote:
>>>>
>>>> Dear all,
>>>>
>>>> I followed the instructions found here:
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/deployment/kubernetes.html
>>>> Minikube version 0.31-01
>>>> Kubernetes version 1.10
>>>> Flink Docker image: flink:latest (1.7.0-scala_2.12)
>>>>
>>>> I ran the following commands:
>>>>
>>>> minikube start
>>>> minikube ssh 'sudo ip link set docker0 promisc on'
>>>> kubectl create -f jobmanager-deployment.yaml
>>>> kubectl create -f taskmanager-deployment.yaml
>>>> kubectl create -f jobmanager-service.yaml
>>>>
>>>> The 2 taskmanagers fail.
>>>> Output:
>>>>
>>>> Starting Task Manager
>>>> config file:
>>>> jobmanager.rpc.address: flink-jobmanager
>>>> jobmanager.rpc.port: 6123
>>>> jobmanager.heap.size: 1024m
>>>> taskmanager.heap.size: 1024m
>>>> taskmanager.numberOfTaskSlots: 2
>>>> parallelism.default: 1
>>>> rest.port: 8081
>>>> blob.server.port: 6124
>>>> query.server.port: 6125
>>>> Starting taskexecutor as a console application on host
>>>> flink-taskmanager-7679c9d55d-n2trk.
>>>> 2018-12-19 11:42:45,216 INFO
>>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -
>>>> --------------------------------------------------------------------------------
>>>> 2018-12-19 11:42:45,218 INFO
>>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Starting
>>>> TaskManager (Version: 1.7.0, Rev:49da9f9, Date:28.11.2018 @ 17:59:06 UTC)
>>>> 2018-12-19 11:42:45,218 INFO
>>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  OS current
>>>> user: flink
>>>> 2018-12-19 11:42:45,219 INFO
>>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  *Current
>>>> Hadoop/Kerberos user: <no hadoop dependency found>*
>>>> 2018-12-19 11:42:45,219 INFO
>>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JVM:
>>>> OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13
>>>> 2018-12-19 11:42:45,219 INFO
>>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Maximum
>>>> heap size: 922 MiBytes
>>>> 2018-12-19 11:42:45,220 INFO
>>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JAVA_HOME:
>>>> /docker-java-home/jre
>>>> 2018-12-19 11:42:45,220 INFO
>>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  No Hadoop
>>>> Dependency available
>>>> 2018-12-19 11:42:45,221 INFO
>>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JVM
>>>> Options:
>>>> 2018-12-19 11:42:45,221 INFO
>>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -
>>>> -XX:+UseG1GC
>>>> 2018-12-19 11:42:45,221 INFO
>>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -Xms922M
>>>> 2018-12-19 11:42:45,221 INFO
>>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -Xmx922M
>>>> 2018-12-19 11:42:45,221 INFO
>>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -
>>>> -XX:MaxDirectMemorySize=8388607T
>>>> 2018-12-19 11:42:45,223 INFO
>>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -
>>>> -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
>>>> 2018-12-19 11:42:45,223 INFO
>>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -
>>>> -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
>>>> 2018-12-19 11:42:45,223 INFO
>>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Program
>>>> Arguments:
>>>> 2018-12-19 11:42:45,223 INFO
>>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -
>>>> --configDir
>>>> 2018-12-19 11:42:45,224 INFO
>>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -
>>>> /opt/flink/conf
>>>> 2018-12-19 11:42:45,224 INFO
>>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Classpath:
>>>> /opt/flink/lib/flink-python_2.12-1.7.0.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.15.jar:/opt/flink/lib/flink-dist_2.12-1.7.0.jar:::
>>>> 2018-12-19 11:42:45,224 INFO
>>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -
>>>> --------------------------------------------------------------------------------
>>>> 2018-12-19 11:42:45,228 INFO
>>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - Registered
>>>> UNIX signal handlers for [TERM, HUP, INT]
>>>> 2018-12-19 11:42:45,233 INFO
>>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - Maximum
>>>> number of open file descriptors is 1048576.
>>>> 2018-12-19 11:42:45,249 INFO
>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>>>> configuration property: jobmanager.rpc.address, flink-jobmanager
>>>> 2018-12-19 11:42:45,250 INFO
>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>>>> configuration property: jobmanager.rpc.port, 6123
>>>> 2018-12-19 11:42:45,251 INFO
>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>>>> configuration property: jobmanager.heap.size, 1024m
>>>> 2018-12-19 11:42:45,251 INFO
>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>>>> configuration property: taskmanager.heap.size, 1024m
>>>> 2018-12-19 11:42:45,251 INFO
>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>>>> configuration property: taskmanager.numberOfTaskSlots, 2
>>>> 2018-12-19 11:42:45,252 INFO
>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>>>> configuration property: parallelism.default, 1
>>>> 2018-12-19 11:42:45,252 INFO
>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>>>> configuration property: rest.port, 8081
>>>> 2018-12-19 11:42:45,254 INFO
>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>>>> configuration property: blob.server.port, 6124
>>>> 2018-12-19 11:42:45,254 INFO
>>>> org.apache.flink.configuration.GlobalConfiguration            - Loading
>>>> configuration property: query.server.port, 6125
>>>>
>>>>
>>>> *2018-12-19 11:42:45,261 INFO
>>>> org.apache.flink.core.fs.FileSystem                           - Hadoop is
>>>> not in the classpath/dependencies. The extended set of supported File
>>>> Systems via Hadoop is not available. 2018-12-19 11:42:45,282 INFO
>>>> org.apache.flink.runtime.security.modules.HadoopModuleFactory  - Cannot
>>>> create Hadoop Security Module because Hadoop cannot be found in the
>>>> Classpath. 2018-12-19 11:42:45,311 INFO
>>>> org.apache.flink.runtime.security.SecurityUtils               - Cannot
>>>> install HadoopSecurityContext because Hadoop cannot be found in the
>>>> Classpath.*
>>>>
>>>> Any suggestions? Should I try maybe the Hadoop images? (I'm not
>>>> planning to integrate with Hadoop)
>>>>
>>>> Thank you!
>>>>
>>>>

Mime
View raw message