flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guilherme Nobre <guilhermenob...@gmail.com>
Subject Logback doesn't receive logs from job
Date Tue, 26 Jun 2018 11:30:15 GMT
Hi all,

I have a Flink cluster (1.4.0) built from flink's docker image, with 1 job
manager and 2 task managers. I'm trying to use logback isntead of log4j and
as far as the cluster configurations goes, seems alright.

Following
https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/best_practices.html
I have removed log4j-1.2.xx.jar and sfl4j-log4j12-xxx.jar from lib/ and
added logback-classic.jar, logback-cor.jar and log4j-over-slf4j.jar. This
is a snippet from the logs:

*2018-06-26 11:02:47.760 [main] DEBUG
org.apache.hadoop.security.UserGroupInformation  - hadoop login*
*2018-06-26 11:02:47.761 [main] DEBUG
org.apache.hadoop.security.UserGroupInformation  - hadoop login commit*
*2018-06-26 11:02:47.764 [main] DEBUG
org.apache.hadoop.security.UserGroupInformation  - using local
user:UnixPrincipal: flink*
*2018-06-26 11:02:47.764 [main] DEBUG
org.apache.hadoop.security.UserGroupInformation  - Using user:
"UnixPrincipal: flink" with name flink*
*2018-06-26 11:02:47.765 [main] DEBUG
org.apache.hadoop.security.UserGroupInformation  - User entry: "flink"*
*2018-06-26 11:02:47.765 [main] DEBUG
org.apache.hadoop.security.UserGroupInformation  - Assuming keytab is
managed externally since logged in from subject.*
*2018-06-26 11:02:47.766 [main] DEBUG
org.apache.hadoop.security.UserGroupInformation  - UGI loginUser:flink
(auth:SIMPLE)*
*2018-06-26 11:02:47.766 [main] INFO
org.apache.flink.runtime.jobmanager.JobManager  -  Current Hadoop/Kerberos
user: flink*
*2018-06-26 11:02:47.766 [main] INFO
org.apache.flink.runtime.jobmanager.JobManager  -  JVM: OpenJDK 64-Bit
Server VM - Oracle Corporation - 1.8/25.151-b12*
*2018-06-26 11:02:47.766 [main] INFO
org.apache.flink.runtime.jobmanager.JobManager  -  Maximum heap size: 981
MiBytes*
*2018-06-26 11:02:47.766 [main] INFO
org.apache.flink.runtime.jobmanager.JobManager  -  JAVA_HOME:
/docker-java-home/jre*
*2018-06-26 11:02:47.771 [main] INFO
org.apache.flink.runtime.jobmanager.JobManager  -  Hadoop version: 2.8.0*
*2018-06-26 11:02:47.771 [main] INFO
org.apache.flink.runtime.jobmanager.JobManager  -  JVM Options:*
*2018-06-26 11:02:47.771 [main] INFO
org.apache.flink.runtime.jobmanager.JobManager  -     -Xms1024m*
*2018-06-26 11:02:47.771 [main] INFO
org.apache.flink.runtime.jobmanager.JobManager  -     -Xmx1024m*
*2018-06-26 11:02:47.771 [main] INFO
org.apache.flink.runtime.jobmanager.JobManager  -
 -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties*
*2018-06-26 11:02:47.772 [main] INFO
org.apache.flink.runtime.jobmanager.JobManager  -
 -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml*
*2018-06-26 11:02:47.772 [main] INFO
org.apache.flink.runtime.jobmanager.JobManager  -  Program Arguments:*
*2018-06-26 11:02:47.772 [main] INFO
org.apache.flink.runtime.jobmanager.JobManager  -     --configDir*
*2018-06-26 11:02:47.772 [main] INFO
org.apache.flink.runtime.jobmanager.JobManager  -     /opt/flink/conf*
*2018-06-26 11:02:47.772 [main] INFO
org.apache.flink.runtime.jobmanager.JobManager  -     --executionMode*
*2018-06-26 11:02:47.772 [main] INFO
org.apache.flink.runtime.jobmanager.JobManager  -     cluster*
*2018-06-26 11:02:47.772 [main] INFO
org.apache.flink.runtime.jobmanager.JobManager  -  Classpath:
/opt/flink/lib/flink-python_2.11-1.4.0.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.4.0.jar:/opt/flink/lib/log4j-over-slf4j.jar:/opt/flink/lib/logback-classic.jar:/opt/flink/lib/logback-core.jar:/opt/flink/lib/flink-dist_2.11-1.4.0.jar:::*

Above you can see the logback jars are properly in the classpath and that
is using logback-console.xml as the configuration file. This is the content
logback-console.xml:

*<configuration>*
*    <appender name="console" class="ch.qos.logback.core.ConsoleAppender">*
*        <encoder>*
*            <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level
%logger{60} %X{sourceThread} - %msg%n</pattern>*
*        </encoder>*
*    </appender>*

*    <root level="DEBUG">*
*        <appender-ref ref="console"/>*
*    </root>*
*</configuration>*

Which justify the DEBUG logs seem above as well, setting the level to DEBUG
is working fine. Now comes the weirdest part. This is part of the pipeline:

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class Pipeline {

    private static final Logger LOGGER =
LoggerFactory.getLogger(Pipeline.class);

    public static void main(String[] args) throws Exception {

        LOGGER.info("****************************************************
INFO");
        LOGGER.warn("****************************************************
WARN");
        LOGGER.error("****************************************************
ERROR");
        LOGGER.debug("****************************************************
DEBUG");
        LOGGER.trace("****************************************************
TRACE");

As you can see, I'm important slf4j logger and logger factory and
instantiating the logger. Then, I just log all 5 levels.

I can't see ANY of these logs in the job manager / task manager logs. They
work fine locally though:

*13:16:11.030 [main] INFO Pipeline -
**************************************************** INFO*
*13:16:11.035 [main] WARN Pipeline -
**************************************************** WARN*
*13:16:16.239 [main] ERROR Pipeline -
**************************************************** ERROR*
*13:16:16.239 [main] DEBUG Pipeline -
**************************************************** DEBUG*

It's not the log level as I can clearly see DEBUG and INFO logs from Flink,
and the logback-console.xml is alright. The logs are not getting to the
cluster for some reason.
Then I tried this:

ch.qos.logback.classic.Logger testLog = (ch.qos.logback.classic.Logger)LOGGER;
testLog.info("**************************************************** INFO");
testLog.warn("**************************************************** WARN");
testLog.error("**************************************************** ERROR");
testLog.debug("**************************************************** DEBUG");
testLog.trace("**************************************************** TRACE");

if(LOGGER == null) {
    throw new Exception("*************  logger is NULL");
} else if (LOGGER != null) {
    throw new Exception("*************  logger is not NULL " +
LOGGER.getName()) LOGGER.getClass().;
}

and got this:

*2018-06-26 11:24:27.268 [jobmanager-future-thread-5] DEBUG
org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler  - Error while
handling request.*
*java.util.concurrent.CompletionException:
org.apache.flink.util.FlinkException: Could not run the jar.*
* at
org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleJsonRequest$0(JarRunHandler.java:90)*
* at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)*
* at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)*
* at java.util.concurrent.FutureTask.run(FutureTask.java:266)*
* at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)*
* at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)*
* at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)*
* at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)*
* at java.lang.Thread.run(Thread.java:748)*
*Caused by: org.apache.flink.util.FlinkException: Could not run the jar.*
* ... 9 common frames omitted*
*Caused by: org.apache.flink.client.program.ProgramInvocationException: The
main method caused an error.*
* at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:542)*
* at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:417)*
* at
org.apache.flink.client.program.OptimizerPlanEnvironment.getOptimizedPlan(OptimizerPlanEnvironment.java:83)*
* at
org.apache.flink.client.program.ClusterClient.getOptimizedPlan(ClusterClient.java:334)*
* at
org.apache.flink.runtime.webmonitor.handlers.JarActionHandler.getJobGraphAndClassLoader(JarActionHandler.java:76)*
* at
org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleJsonRequest$0(JarRunHandler.java:69)*
* ... 8 common frames omitted*
*Caused by: java.lang.Exception: *************  logger is not NULL DEBUG
1656864061*
* at io.intellisense.MongoPipeline.main(MongoPipeline.java:79)*
* at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)*
* at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)*
* at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
* at java.lang.reflect.Method.invoke(Method.java:498)*
* at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:525)*
* ... 13 common frames omitted*

This means the logger was successfully set and received the DEBUG level as
the *conf/logback-console.yml* and again, I was only able to see this error
because it came from
*org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler*, not from my
pipeline.
I'm running out of options, does anyone have any idea?

Mime
View raw message