flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Subramanyam Ramanathan <subramanyam.ramanat...@microfocus.com>
Subject RE: NoClassDefFoundError in failing-restarting job that uses url classloader
Date Mon, 23 Sep 2019 06:50:32 GMT

I was able to simulate the issue again and understand the cause a little better.

The issue occurs when :

-        One of the RichMapFunction transformations uses a third party library in the open()
method that spawns a thread.

-        The thread doesn’t get properly closed in the close() method.

-        Once the job starts failing, we start seeing a NoClassDefFound error from that thread.

I understand that cleanup should be done in the close() method. However, just wanted to know,
do we have some kind of a configuration setting  which would help us clean up such threads
I can attach the code if required.


From: Zhu Zhu [mailto:reedpor@gmail.com]
Sent: Friday, August 9, 2019 7:43 AM
To: Subramanyam Ramanathan <subramanyam.ramanathan@microfocus.com>
Cc: user@flink.apache.org
Subject: Re: NoClassDefFoundError in failing-restarting job that uses url classloader

Hi Subramanyam,

Could you share more information? including:
1. the URL pattern
2. the detailed exception and the log around it
3. the cluster the job is running on, e.g. standalone, yarn, k8s
4. it's session mode or per job mode

This information would be helpful to identify the failure cause.

Zhu Zhu

Subramanyam Ramanathan <subramanyam.ramanathan@microfocus.com<mailto:subramanyam.ramanathan@microfocus.com>>
于2019年8月9日周五 上午1:45写道:


I'm currently using flink 1.7.2.

I'm trying to run a job that's submitted programmatically using the ClusterClient API.
               public JobSubmissionResult run(PackagedProgram prog, int parallelism)

The job makes use of some jars which I add to the packaged program through the Packaged constructor,
along with the Jar file.
   public PackagedProgram(File jarFile, List<URL> classpaths, String... args)
Normally, This works perfectly and the job runs fine.

However, if there's an error in the job, and the job goes into failing state and when it's
continously  trying to restart the job for an hour or so, I notice a NoClassDefFoundError
for some classes in the jars that I load using the URL class loader and the job never recovers
after that, even if the root cause of the issue was fixed (I had a kafka source/sink in my
job, and kafka was down temporarily, and was brought up after that).
The jar is still available at the path referenced by the url classloader and is not tampered

Could anyone please give me some pointers with regard to the reason why this could happen/what
I could be missing here/how can I debug further ?


View raw message