flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dongwon Kim <eastcirc...@gmail.com>
Subject Re: User program failures cause JobManager to be shutdown
Date Mon, 09 Dec 2019 01:25:02 GMT
Hi Robert and Roman,
Yeah, letting users know System.exit() is called would be much more
appropriate than just intercepting and ignoring.

Best,
Dongwon

On Sat, Dec 7, 2019 at 11:29 PM Robert Metzger <rmetzger@apache.org> wrote:

> I guess we could manage the security only when calling the user's main()
> method.
>
> This problem actually exists for all usercode in Flink: You can also kill
> TaskManagers like this.
> If we are going to add something like this to Flink, I would only log that
> System.exit() has been called by the user code, not intercept and ignore
> the call.
>
> On Fri, Dec 6, 2019 at 10:31 AM Khachatryan Roman <
> khachatryan.roman@gmail.com> wrote:
>
>> Hi Dongwon,
>>
>> This should work but it could also interfere with Flink itself exiting in
>> case of a fatal error.
>>
>> Regards,
>> Roman
>>
>>
>> On Fri, Dec 6, 2019 at 2:54 AM Dongwon Kim <eastcirclek@gmail.com> wrote:
>>
>>> FYI, we've launched a session cluster where multiple jobs are managed by
>>> a job manager. If that happens, all the other jobs also fail because the
>>> job manager is shut down and all the task managers get into chaos (failing
>>> to connect to the job manager).
>>>
>>> I just searched a way to prevent System.exit() calls from terminating
>>> JVMs and found [1]. Can it be a possible solution to the problem?
>>>
>>> [1]
>>> https://stackoverflow.com/questions/5549720/how-to-prevent-calls-to-system-exit-from-terminating-the-jvm
>>>
>>> Best,
>>> - Dongwon
>>>
>>> On Fri, Dec 6, 2019 at 10:39 AM Dongwon Kim <eastcirclek@gmail.com>
>>> wrote:
>>>
>>>> Hi Robert and Roman,
>>>>
>>>> Thank you for taking a look at this.
>>>>
>>>> what is your main() method / client doing when it's receiving wrong
>>>>> program parameters? Does it call System.exit(), or something like that?
>>>>>
>>>>
>>>> I just found that our HTTP client is programmed to call System.exit(1).
>>>> I should guide not to call System.exit() in Flink applications.
>>>>
>>>> p.s. Just out of curiosity, is there no way for the web app to
>>>> intercept System.exit() and prevent the job manager from being shutting
>>>> down?
>>>>
>>>> Best,
>>>>
>>>> - Dongwon
>>>>
>>>> On Fri, Dec 6, 2019 at 3:59 AM Robert Metzger <rmetzger@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Dongwon,
>>>>>
>>>>> what is your main() method / client doing when it's receiving wrong
>>>>> program parameters? Does it call System.exit(), or something like that?
>>>>>
>>>>> By the way, the http address from the error message is
>>>>> publicly available. Not sure if this is internal data or not.
>>>>>
>>>>> On Thu, Dec 5, 2019 at 6:32 PM Khachatryan Roman <
>>>>> khachatryan.roman@gmail.com> wrote:
>>>>>
>>>>>> Hi Dongwon,
>>>>>>
>>>>>> I wasn't able to reproduce your problem with Flink JobManager 1.9.1
>>>>>> with various kinds of errors in the job.
>>>>>> I suggest you try it on a fresh Flink installation without any other
>>>>>> jobs submitted.
>>>>>>
>>>>>> Regards,
>>>>>> Roman
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 5, 2019 at 3:48 PM Dongwon Kim <eastcirclek@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Roman,
>>>>>>>
>>>>>>> We're using the latest version 1.9.1 and those two lines are
all
>>>>>>> I've seen after executing the job on the web ui.
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Dongwon
>>>>>>>
>>>>>>> On Thu, Dec 5, 2019 at 11:36 PM r_khachatryan <
>>>>>>> khachatryan.roman@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Dongwon,
>>>>>>>>
>>>>>>>> Could you please provide Flink version you are running and
the job
>>>>>>>> manager
>>>>>>>> logs?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Roman
>>>>>>>>
>>>>>>>>
>>>>>>>> eastcirclek wrote
>>>>>>>> > Hi,
>>>>>>>> >
>>>>>>>> > I tried to run a program by uploading a jar on Flink
UI. When I
>>>>>>>> > intentionally enter a wrong parameter to my program,
JobManager
>>>>>>>> dies.
>>>>>>>> > Below
>>>>>>>> > is all log messages I can get from JobManager; JobManager
dies as
>>>>>>>> soon as
>>>>>>>> > spitting the second line:
>>>>>>>> >
>>>>>>>> > 2019-12-05 04:47:58,623 WARN
>>>>>>>> >>  org.apache.flink.runtime.webmonitor.handlers.JarRunHandler
   -
>>>>>>>> >> Configuring the job submission via query parameters
is
>>>>>>>> deprecated. Please
>>>>>>>> >> migrate to submitting a JSON request instead.
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> *2019-12-05 04:47:59,133 ERROR com.skt.apm.http.HTTPClient
>>>>>>>> >>                   - Cannot
>>>>>>>> >> connect:
>>>>>>>> http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models
>>>>>>>> >> &lt;
>>>>>>>> http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models&gt
>>>>>>>> ;:
>>>>>>>> >> com.fasterxml.jackson.databind.exc.MismatchedInputException:
>>>>>>>> Cannot
>>>>>>>> >> deserialize instance of `java.util.ArrayList` out
of
>>>>>>>> START_OBJECT token
>>>>>>>> >> at
>>>>>>>> >> [Source:
>>>>>>>> >>
>>>>>>>> (String)“{”code”:“GB0001”,“resource”:“msg.comm.unknown.error”,“details”:“NullPointerException:
>>>>>>>> >> “}”; line: 1, column: 1]2019-12-05 04:47:59,166
INFO
>>>>>>>> >>  org.apache.flink.runtime.blob.BlobServer      
               -
>>>>>>>> Stopped
>>>>>>>> >> BLOB server at 0.0.0.0:6124 &lt;http://0.0.0.0:6124&gt;*
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > The second line is obviously from my program and it
shouldn't
>>>>>>>> cause
>>>>>>>> > JobManager to be shut down. Is it intended behavior?
>>>>>>>> >
>>>>>>>> > Best,
>>>>>>>> >
>>>>>>>> > Dongwon
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Sent from:
>>>>>>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>>>>>>>>
>>>>>>>

Mime
View raw message