flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Yao <g...@data-artisans.com>
Subject Re: connection failed when running flink in a cluster
Date Mon, 06 Aug 2018 19:57:16 GMT
Hi,

Can you try submitting with:

    ./bin/flink run examples/streaming/SocketWindowWordCount.jar --hostname
<IP> --port 9000

where IP is the IP of the node where you started nc? If not specified, the
default hostname is localhost. This problematic is if the source operator is
scheduled on a different physical machine.

Best,
Gary

On Mon, Aug 6, 2018 at 6:01 PM, Felipe Gutierrez <
felipe.o.gutierrez@gmail.com> wrote:

> Hi Vino,
>
> the UI shows the job as completed.
> I had run "./bin/flink run examples/streaming/WordCount.jar" and I get no
> error.
>
> When I start netcat "nc -l 9000" and in other terminal I run "./bin/flink
> run examples/streaming/SocketWindowWordCount.jar --port 9000" I have this
> exception.
>
> Starting execution of program
>
> ------------------------------------------------------------
>  The program finished with the following exception:
>
> org.apache.flink.client.program.ProgramInvocationException:
> java.net.ConnectException: Connection refused
> at org.apache.flink.client.program.rest.RestClusterClient.submitJob(
> RestClusterClient.java:264)
> at org.apache.flink.client.program.ClusterClient.run(
> ClusterClient.java:464)
> at org.apache.flink.streaming.api.environment.StreamContextEnvironment.
> execute(StreamContextEnvironment.java:66)
> at org.apache.flink.streaming.examples.socket.SocketWindowWordCount.main(
> SocketWindowWordCount.java:92)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at org.apache.flink.client.program.PackagedProgram.callMainMethod(
> PackagedProgram.java:528)
> at org.apache.flink.client.program.PackagedProgram.
> invokeInteractiveModeForExecution(PackagedProgram.java:420)
> at org.apache.flink.client.program.ClusterClient.run(
> ClusterClient.java:404)
> at org.apache.flink.client.cli.CliFrontend.executeProgram(
> CliFrontend.java:785)
> at org.apache.flink.client.cli.CliFrontend.runProgram(
> CliFrontend.java:279)
> at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:214)
> at org.apache.flink.client.cli.CliFrontend.parseParameters(
> CliFrontend.java:1025)
> at org.apache.flink.client.cli.CliFrontend.lambda$main$9(
> CliFrontend.java:1101)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1836)
> at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(
> HadoopSecurityContext.java:41)
> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1101)
> Caused by: java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at java.net.AbstractPlainSocketImpl.doConnect(
> AbstractPlainSocketImpl.java:350)
> at java.net.AbstractPlainSocketImpl.connectToAddress(
> AbstractPlainSocketImpl.java:206)
> at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:
> 188)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at org.apache.flink.streaming.api.functions.source.
> SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
> at org.apache.flink.streaming.api.operators.StreamSource.
> run(StreamSource.java:87)
> at org.apache.flink.streaming.api.operators.StreamSource.
> run(StreamSource.java:56)
> at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(
> SourceStreamTask.java:99)
> at org.apache.flink.streaming.runtime.tasks.StreamTask.
> invoke(StreamTask.java:306)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
> *--*
> *-- Felipe Gutierrez*
>
> *-- skype: felipe.o.gutierrez*
> *--* *https://felipeogutierrez.blogspot.com
> <https://felipeogutierrez.blogspot.com>*
>
>
> On Mon, Aug 6, 2018 at 5:54 PM vino yang <yanghua1127@gmail.com> wrote:
>
>> Hi Felipe,
>>
>> You got the result? And the web UI shown the job is completed?
>>
>> If it throws the exception you provided, the job's status should be
>> failed.
>>
>> Thanks, vino.
>>
>> 2018-08-06 23:42 GMT+08:00 Felipe Gutierrez <felipe.o.gutierrez@gmail.com
>> >:
>>
>>> yes. with this example (examples/streaming/WordCount.jar) my cluster
>>> worked.
>>>
>>> the file log/*out from the master is still empty and the file log/*out
>>> from the slave node has my result. The dashboard also shows that the job is
>>> completed.
>>>
>>> So, like you said there are some external dependencies that I didn`t
>>> include in my deploy. Do you have any clue?
>>>
>>> I am following the original quickstart (https://ci.apache.org/
>>> projects/flink/flink-docs-master/quickstart/setup_quickstart.html)
>>>
>>> Kind Regards,
>>> Felipe
>>>
>>>
>>>
>>>
>>> *--*
>>> *-- Felipe Gutierrez*
>>>
>>> *-- skype: felipe.o.gutierrez*
>>> *--* *https://felipeogutierrez.blogspot.com
>>> <https://felipeogutierrez.blogspot.com>*
>>>
>>>
>>> On Mon, Aug 6, 2018 at 5:01 PM Gary Yao <gary@data-artisans.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> nc exits after the first connection is closed. Are you re-running the nc
>>>> command every time the job finishes?
>>>>
>>>> The stacktrace you copied does not indicate that a TaskManager cannot
>>>> connect
>>>> to the JobManager. I can only see that the SocketTextStreamFunction
>>>> (from the
>>>> SocketWindowWordCount job?) cannot open the connection to the address
>>>> that you
>>>> specified.
>>>>
>>>> Can you try to run examples/streaming/WordCount.jar. It is a simpler
>>>> job which
>>>> does not rely on external dependencies.
>>>>
>>>> If all the above fails, can you tell us how you submit the job? Can you
>>>> post
>>>> the full command? Can you also post the full JobManager & TaskManager
>>>> logs?
>>>>
>>>> Best,
>>>> Gary
>>>>
>>>>
>>>>
>>>> On Mon, Aug 6, 2018 at 4:10 PM, Felipe Gutierrez <
>>>> felipe.o.gutierrez@gmail.com> wrote:
>>>>
>>>>> do you mean "nc -l 9000"? If so, I did start before.
>>>>> the task manager running on the master can connect to the job manager.
>>>>> but the task manager on the slave node cannot. The second time that I
start
>>>>> the WordCount task it recognizes only one task manager (from the master)
>>>>> and runs my task. But the task manager from the slave does not process
>>>>> anything and it is started.
>>>>>
>>>>> here is the error stack trace from the slave node:
>>>>>
>>>>> 2017-05-30 05:10:39,853 INFO  org.apache.flink.runtime.state.heap.HeapKeyedStateBackend
>>>>>    - Initializing heap keyed state backend with stream factory.
>>>>> 2017-05-30 05:10:39,977 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>>                    - Source: Socket Stream -> Flat Map (1/1) (
>>>>> d5e3d87395995d3977d2f472de896e23) switched from RUNNING to FAILED.
>>>>> java.net.ConnectException: Connection refused
>>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>> at java.net.AbstractPlainSocketImpl.doConnect(
>>>>> AbstractPlainSocketImpl.java:350)
>>>>> at java.net.AbstractPlainSocketImpl.connectToAddress(
>>>>> AbstractPlainSocketImpl.java:206)
>>>>> at java.net.AbstractPlainSocketImpl.connect(
>>>>> AbstractPlainSocketImpl.java:188)
>>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>> at java.net.Socket.connect(Socket.java:589)
>>>>> at org.apache.flink.streaming.api.functions.source.
>>>>> SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>>>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>>>> run(StreamSource.java:87)
>>>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>>>> run(StreamSource.java:56)
>>>>> at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(
>>>>> SourceStreamTask.java:99)
>>>>> at org.apache.flink.streaming.runtime.tasks.StreamTask.
>>>>> invoke(StreamTask.java:306)
>>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>> 2017-05-30 05:10:40,016 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>>                    - Freeing task resources for Source: Socket Stream
->
>>>>> Flat Map (1/1) (d5e3d87395995d3977d2f472de896e23).
>>>>>
>>>>>
>>>>> *--*
>>>>> *-- Felipe Gutierrez*
>>>>>
>>>>> *-- skype: felipe.o.gutierrez*
>>>>> *--* *https://felipeogutierrez.blogspot.com
>>>>> <https://felipeogutierrez.blogspot.com>*
>>>>>
>>>>>
>>>>> On Mon, Aug 6, 2018 at 2:17 PM vino yang <yanghua1127@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Felipe,
>>>>>>
>>>>>> From the exception information, it seems that you did not start the
>>>>>> socket server, the socket source needs to connect to the socket server.
>>>>>>
>>>>>> Please make sure the socket server has started and is available.
>>>>>>
>>>>>> Thanks, vino.
>>>>>>
>>>>>> 2018-08-06 18:45 GMT+08:00 Felipe Gutierrez <
>>>>>> felipe.o.gutierrez@gmail.com>:
>>>>>>
>>>>>>> yes.
>>>>>>>
>>>>>>> when I execute the jps command on the master node I
>>>>>>> see TaskManagerRunner and StandaloneSessionClusterEntrypoint
(which
>>>>>>> I believe it is the  jobManager). On the slave nodes I
>>>>>>> see TaskManagerRunner when I run jps command
>>>>>>>
>>>>>>>
>>>>>>> *--*
>>>>>>> *-- Felipe Gutierrez*
>>>>>>>
>>>>>>> *-- skype: felipe.o.gutierrez*
>>>>>>> *--* *https://felipeogutierrez.blogspot.com
>>>>>>> <https://felipeogutierrez.blogspot.com>*
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 6, 2018 at 12:13 PM miki haiat <miko5054@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Did you start job manager and task manager on the same resbery
pi ?
>>>>>>>>
>>>>>>>> On Mon, 6 Aug 2018, 12:01 Felipe Gutierrez, <
>>>>>>>> felipe.o.gutierrez@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hello everyone,
>>>>>>>>>
>>>>>>>>> I am trying to run Flink on Raspberry Pis. My first test
for word
>>>>>>>>> count in a single node worked. I just have to decrease
the Heap memory of
>>>>>>>>> the jobmanager.heap.mb and taskmanager.heap.mb to 512.
>>>>>>>>> My second test is to add 2 slave nodes I got the error:
"Java
>>>>>>>>> HotSpot(TM) Client VM warning: G1 GC is disabled in this
release." at the
>>>>>>>>> file log/flink-root-taskexecutor-0-*.out.
>>>>>>>>>
>>>>>>>>> This link (https://blog.sflow.com/2016/06/raspberry-pi-real-time-
>>>>>>>>> network-analytics.html) says that in order to Raspberry
Pi ARM
>>>>>>>>> architecture works with JVM it is necessary to configure
the JVM as:
>>>>>>>>> -Xms600M
>>>>>>>>> -Xmx600M
>>>>>>>>> -XX:+UseParNewGC
>>>>>>>>> -XX:+UseConcMarkSweepGC
>>>>>>>>> -XX:+CMSIncrementalMode
>>>>>>>>>
>>>>>>>>> then I set this variables on the path inside the file
>>>>>>>>> flink-conf.yaml
>>>>>>>>> env.java.opts: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>>>>>>>> -XX:+CMSIncrementalMode"
>>>>>>>>> env.java.opts.jobmanager: "-XX:+UseParNewGC
>>>>>>>>> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"
>>>>>>>>> env.java.opts.taskmanager: "-XX:+UseParNewGC
>>>>>>>>> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"
>>>>>>>>>
>>>>>>>>> and the error "Java HotSpot(TM) Client VM warning: G1
GC is
>>>>>>>>> disabled in this release." is not showing anymore. However,
the connection
>>>>>>>>> from the master node to the slave node is still not possible.
Does anybody
>>>>>>>>> know how I must configure flink to deal with that?
>>>>>>>>>
>>>>>>>>> This is the error stack trace:
>>>>>>>>>
>>>>>>>>> 2017-05-25 12:40:26,421 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>>>>>>>       - Source: Socket Stream -> Flat Map (1/1) (
>>>>>>>>> b81b6492fc0860367be422d0b0bf4358) switched from DEPLOYING
to
>>>>>>>>> RUNNING.
>>>>>>>>> 2017-05-25 12:40:26,891 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>>>>>>>       - Source: Socket Stream -> Flat Map (1/1) (
>>>>>>>>> b81b6492fc0860367be422d0b0bf4358) switched from RUNNING
to FAILED.
>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>>>>>> at java.net.AbstractPlainSocketImpl.doConnect(
>>>>>>>>> AbstractPlainSocketImpl.java:350)
>>>>>>>>> at java.net.AbstractPlainSocketImpl.connectToAddress(
>>>>>>>>> AbstractPlainSocketImpl.java:206)
>>>>>>>>> at java.net.AbstractPlainSocketImpl.connect(
>>>>>>>>> AbstractPlainSocketImpl.java:188)
>>>>>>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>>>>>> at java.net.Socket.connect(Socket.java:589)
>>>>>>>>> at org.apache.flink.streaming.api.functions.source.
>>>>>>>>> SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>>>>>>>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>>>>>>>> run(StreamSource.java:87)
>>>>>>>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>>>>>>>> run(StreamSource.java:56)
>>>>>>>>> at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(
>>>>>>>>> SourceStreamTask.java:99)
>>>>>>>>> at org.apache.flink.streaming.runtime.tasks.StreamTask.
>>>>>>>>> invoke(StreamTask.java:306)
>>>>>>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>>>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>>>> 2017-05-25 12:40:26,898 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>>>>>>>       - Job Socket Window WordCount (
>>>>>>>>> 71c6d7796eccf6587d9d1deda0490e09) switched from state
RUNNING to
>>>>>>>>> FAILING.
>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>>>>>> at java.net.AbstractPlainSocketImpl.doConnect(
>>>>>>>>> AbstractPlainSocketImpl.java:350)
>>>>>>>>> at java.net.AbstractPlainSocketImpl.connectToAddress(
>>>>>>>>> AbstractPlainSocketImpl.java:206)
>>>>>>>>> at java.net.AbstractPlainSocketImpl.connect(
>>>>>>>>> AbstractPlainSocketImpl.java:188)
>>>>>>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>>>>>> at java.net.Socket.connect(Socket.java:589)
>>>>>>>>> at org.apache.flink.streaming.api.functions.source.
>>>>>>>>> SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>>>>>>>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>>>>>>>> run(StreamSource.java:87)
>>>>>>>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>>>>>>>> run(StreamSource.java:56)
>>>>>>>>> at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(
>>>>>>>>> SourceStreamTask.java:99)
>>>>>>>>> at org.apache.flink.streaming.runtime.tasks.StreamTask.
>>>>>>>>> invoke(StreamTask.java:306)
>>>>>>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>>>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>>>> 2017-05-25 12:40:26,921 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>>>>>>>       - Window(TumblingProcessingTimeWindows(5000),
>>>>>>>>> ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)
->
>>>>>>>>> Sink: Print to Std. Out (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b)
>>>>>>>>> switched from RUNNING to CANCELING.
>>>>>>>>> 2017-05-25 12:40:26,975 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>>>>>>>       - Window(TumblingProcessingTimeWindows(5000),
>>>>>>>>> ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)
->
>>>>>>>>> Sink: Print to Std. Out (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b)
>>>>>>>>> switched from CANCELING to CANCELED.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks, Felipe
>>>>>>>>> *--*
>>>>>>>>> *-- Felipe Gutierrez*
>>>>>>>>>
>>>>>>>>> *-- skype: felipe.o.gutierrez*
>>>>>>>>> *--* *https://felipeogutierrez.blogspot.com
>>>>>>>>> <https://felipeogutierrez.blogspot.com>*
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>

Mime
View raw message