flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacky D <jacky.du0...@gmail.com>
Subject Re: Flink Memory analyze on AWS EMR
Date Wed, 13 May 2020 15:59:43 GMT
Hi, Xintong

Thanks for point it out, after I set up the log path it's working now .
so , for conclusion .

on emr , to set up jitwatch in flink-conf.yaml, we should not include
quotes and give a path to output the jit log file . this is different from
setting it on standalone cluster .
example :
env.java.opts: -XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading
-XX:+LogCompilation -XX:LogFile=/tmp/flinkmemdump.jit -XX:+PrintAssembly

Thanks everyone involved in this discussion!

Jacky

Xintong Song <tonysong820@gmail.com> 于2020年5月12日周二 下午10:41写道:

> Hi Jacky,
>
> I don't think ${FLINK_LOG_PREFIX} is available for Flink Yarn deployment.
> This is just my guess, that the actual file name becomes ".jit". You can
> try to verify that by looking for the hidden file.
>
> If it is indeed this problem, you can try to replace "${FLINK_LOG_PREFIX}"
> with "<LOG_DIR>/your-file-name.jit". The token "<LOG_DIR>" should be
> replaced with proper log directory path by Yarn automatically.
>
> I noticed that the usage of ${FLINK_LOG_PREFIX} is recommended by Flink's
> documentation [1]. This is IMO a bit misleading. I'll try to file an issue
> to improve the docs.
>
> Thank you~
>
> Xintong Song
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/application_profiling.html#profiling-with-jitwatch
>
> On Wed, May 13, 2020 at 2:45 AM Jacky D <jacky.du0314@gmail.com> wrote:
>
>> hi, Arvid
>>
>> thanks for the advice  ,  I removed the quotes and it do created a yarn
>> session on EMR , but I didn't find any jit log file generated .
>>
>> The config with quotes is working on standalone cluster . I also tried to
>> dynamic pass the property within the yarn session command :
>>
>> flink-yarn-session -n 1 -d -nm testSession -yD env.java.opts="-XX:+UnlockDiagnosticVMOptions
>> -XX:+TraceClassLoading -XX:+LogCompilation
>> -XX:LogFile=${FLINK_LOG_PREFIX}.jit -XX:+PrintAssembly"
>>
>>
>> but get same result , session created , but can not find any jit log file
>> under container log .
>>
>>
>> Thanks
>>
>> Jacky
>>
>> Arvid Heise <arvid@ververica.com> 于2020年5月12日周二 下午12:57写道:
>>
>>> Hi Jacky,
>>>
>>> I suspect that the quotes are the actual issue. Could you try to remove
>>> them? See also [1].
>>>
>>> [1]
>>> http://blogs.perl.org/users/tinita/2018/03/strings-in-yaml---to-quote-or-not-to-quote.html
>>>
>>> On Tue, May 12, 2020 at 4:03 PM Jacky D <jacky.du0314@gmail.com> wrote:
>>>
>>>> hi, Xintong
>>>>
>>>> Thanks for reply , I attached those lines below for application master
>>>> start command :
>>>>
>>>>
>>>> 2020-05-11 21:16:16,635 DEBUG
>>>> org.apache.hadoop.util.PerformanceAdvisory                    - Crypto
>>>> codec org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec is not available.
>>>> 2020-05-11 21:16:16,635 DEBUG
>>>> org.apache.hadoop.util.PerformanceAdvisory                    - Using
>>>> crypto codec org.apache.hadoop.crypto.JceAesCtrCryptoCodec.
>>>> 2020-05-11 21:16:16,636 DEBUG org.apache.hadoop.hdfs.DataStreamer
>>>>                      - DataStreamer block
>>>> BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 sending packet
>>>> packet seqno: 0 offsetInBlock: 0 lastPacketInBlock: false
>>>> lastByteOffsetInBlock: 1697
>>>> 2020-05-11 21:16:16,637 DEBUG org.apache.hadoop.hdfs.DataStreamer
>>>>                      - DFSClient seqno: 0 reply: SUCCESS
>>>> downstreamAckTimeNanos: 0 flag: 0
>>>> 2020-05-11 21:16:16,637 DEBUG org.apache.hadoop.hdfs.DataStreamer
>>>>                      - DataStreamer block
>>>> BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 sending packet
>>>> packet seqno: 1 offsetInBlock: 1697 lastPacketInBlock: true
>>>> lastByteOffsetInBlock: 1697
>>>> 2020-05-11 21:16:16,638 DEBUG org.apache.hadoop.hdfs.DataStreamer
>>>>                      - DFSClient seqno: 1 reply: SUCCESS
>>>> downstreamAckTimeNanos: 0 flag: 0
>>>> 2020-05-11 21:16:16,638 DEBUG org.apache.hadoop.hdfs.DataStreamer
>>>>                      - Closing old block
>>>> BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315
>>>> 2020-05-11 21:16:16,641 DEBUG org.apache.hadoop.ipc.Client
>>>>                     - IPC Client (1954985045) connection to
>>>> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #70
>>>> org.apache.hadoop.hdfs.protocol.ClientProtocol.complete
>>>> 2020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.Client
>>>>                     - IPC Client (1954985045) connection to
>>>> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value
>>>> #70
>>>> 2020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine
>>>>                      - Call: complete took 2ms
>>>> 2020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.Client
>>>>                     - IPC Client (1954985045) connection to
>>>> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #71
>>>> org.apache.hadoop.hdfs.protocol.ClientProtocol.setTimes
>>>> 2020-05-11 21:16:16,645 DEBUG org.apache.hadoop.ipc.Client
>>>>                     - IPC Client (1954985045) connection to
>>>> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value
>>>> #71
>>>> 2020-05-11 21:16:16,645 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine
>>>>                      - Call: setTimes took 2ms
>>>> 2020-05-11 21:16:16,647 DEBUG org.apache.hadoop.ipc.Client
>>>>                     - IPC Client (1954985045) connection to
>>>> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #72
>>>> org.apache.hadoop.hdfs.protocol.ClientProtocol.setPermission
>>>> 2020-05-11 21:16:16,648 DEBUG org.apache.hadoop.ipc.Client
>>>>                     - IPC Client (1954985045) connection to
>>>> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value
>>>> #72
>>>> 2020-05-11 21:16:16,648 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine
>>>>                      - Call: setPermission took 2ms
>>>> 2020-05-11 21:16:16,654 DEBUG
>>>> org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Application
>>>> Master start command: $JAVA_HOME/bin/java -Xmx424m
>>>> "-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation
>>>> -XX:LogFile=${FLINK_LOG_PREFIX}.jit -XX:+PrintAssembly"
>>>> -Dlog.file="<LOG_DIR>/jobmanager.log"
>>>> -Dlog4j.configuration=file:log4j.properties
>>>> org.apache.flink.yarn.entrypoint.YarnSessionClusterEntrypoint  1>
>>>> <LOG_DIR>/jobmanager.out 2> <LOG_DIR>/jobmanager.err
>>>> 2020-05-11 21:16:16,654 DEBUG org.apache.hadoop.ipc.Client
>>>>                     - stopping client from cache:
>>>> org.apache.hadoop.ipc.Client@28194a50
>>>> 2020-05-11 21:16:16,656 DEBUG
>>>> org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector
>>>> - org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports
>>>> method setApplicationTags.
>>>> 2020-05-11 21:16:16,656 DEBUG
>>>> org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector
>>>> - org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports
>>>> method setAttemptFailuresValidityInterval.
>>>> 2020-05-11 21:16:16,656 DEBUG
>>>> org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector
>>>> - org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports
>>>> method setKeepContainersAcrossApplicationAttempts.
>>>> 2020-05-11 21:16:16,656 DEBUG
>>>> org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector
>>>> - org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports
>>>> method setNodeLabelExpression.
>>>>
>>>> Xintong Song <tonysong820@gmail.com> 于2020年5月11日周一 下午10:11写道:
>>>>
>>>>> Hi Jacky,
>>>>>
>>>>> Could you search for "Application Master start command:" in the debug
>>>>> log and post the result and a few lines before & after that? This
is not
>>>>> included in the clip of attached log file.
>>>>>
>>>>> Thank you~
>>>>>
>>>>> Xintong Song
>>>>>
>>>>>
>>>>>
>>>>> On Tue, May 12, 2020 at 5:33 AM Jacky D <jacky.du0314@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> hi, Robert
>>>>>>
>>>>>> Thanks so much for quick reply  , I changed the log level to debug
>>>>>> and attach the log file .
>>>>>>
>>>>>> Thanks
>>>>>> Jacky
>>>>>>
>>>>>> Robert Metzger <rmetzger@apache.org> 于2020年5月11日周一
下午4:14写道:
>>>>>>
>>>>>>> Thanks a lot for posting the full output.
>>>>>>>
>>>>>>> It seems that Flink is passing an invalid list of arguments to
the
>>>>>>> JVM.
>>>>>>> Can you
>>>>>>> - set the root log level in conf/log4j-yarn-session.properties
to
>>>>>>> DEBUG
>>>>>>> - then launch the YARN session
>>>>>>> - share the log file of the yarn session on the mailing list?
>>>>>>>
>>>>>>> I'm particularly interested in the line printed here, as it shows
>>>>>>> the JVM invocation.
>>>>>>>
>>>>>>> https://github.com/apache/flink/blob/release-1.6/flink-yarn/src/main/java/org/apache/flink/yarn/AbstractYarnClusterDescriptor.java#L1630
>>>>>>>
>>>>>>>
>>>>>>> On Mon, May 11, 2020 at 9:56 PM Jacky D <jacky.du0314@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,Robert
>>>>>>>>
>>>>>>>> Yes , I tried to retrieve more log info from yarn UI , the
full
>>>>>>>> logs showing below , this happens when I try to create a
flink yarn session
>>>>>>>> on emr when set up jitwatch configuration .
>>>>>>>>
>>>>>>>> 2020-05-11 19:06:09,552 ERROR
>>>>>>>> org.apache.flink.yarn.cli.FlinkYarnSessionCli           
     - Error while
>>>>>>>> running the Flink Yarn session.
>>>>>>>> java.lang.reflect.UndeclaredThrowableException
>>>>>>>> at
>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1862)
>>>>>>>> at
>>>>>>>> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>>>>>>>> at
>>>>>>>> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:813)
>>>>>>>> Caused by:
>>>>>>>> org.apache.flink.client.deployment.ClusterDeploymentException:
Couldn't
>>>>>>>> deploy Yarn session cluster
>>>>>>>> at
>>>>>>>> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:429)
>>>>>>>> at
>>>>>>>> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:610)
>>>>>>>> at
>>>>>>>> org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:813)
>>>>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>>>> at
>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
>>>>>>>> ... 2 more
>>>>>>>> Caused by:
>>>>>>>> org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException:
>>>>>>>> The YARN application unexpectedly switched to state FAILED
during
>>>>>>>> deployment.
>>>>>>>> Diagnostics from YARN: Application application_1584459865196_0165
>>>>>>>> failed 1 times (global limit =2; local limit is =1) due to
AM Container for
>>>>>>>> appattempt_1584459865196_0165_000001 exited with  exitCode:
1
>>>>>>>> Failing this attempt.Diagnostics: Exception from container-launch.
>>>>>>>> Container id: container_1584459865196_0165_01_000001
>>>>>>>> Exit code: 1
>>>>>>>> Exception message: Usage: java [-options] class [args...]
>>>>>>>>            (to execute a class)
>>>>>>>>    or  java [-options] -jar jarfile [args...]
>>>>>>>>            (to execute a jar file)
>>>>>>>> where options include:
>>>>>>>>     -d32   use a 32-bit data model if available
>>>>>>>>     -d64   use a 64-bit data model if available
>>>>>>>>     -server   to select the "server" VM
>>>>>>>>                   The default VM is server,
>>>>>>>>                   because you are running on a server-class
machine.
>>>>>>>>
>>>>>>>>
>>>>>>>>     -cp <class search path of directories and zip/jar
files>
>>>>>>>>     -classpath <class search path of directories and zip/jar
files>
>>>>>>>>                   A : separated list of directories, JAR
archives,
>>>>>>>>                   and ZIP archives to search for class files.
>>>>>>>>     -D<name>=<value>
>>>>>>>>                   set a system property
>>>>>>>>     -verbose:[class|gc|jni]
>>>>>>>>                   enable verbose output
>>>>>>>>     -version      print product version and exit
>>>>>>>>     -version:<value>
>>>>>>>>                   Warning: this feature is deprecated and
will be
>>>>>>>> removed
>>>>>>>>                   in a future release.
>>>>>>>>                   require the specified version to run
>>>>>>>>     -showversion  print product version and continue
>>>>>>>>     -jre-restrict-search | -no-jre-restrict-search
>>>>>>>>                   Warning: this feature is deprecated and
will be
>>>>>>>> removed
>>>>>>>>                   in a future release.
>>>>>>>>                   include/exclude user private JREs in the
version
>>>>>>>> search
>>>>>>>>     -? -help      print this help message
>>>>>>>>     -X            print help on non-standard options
>>>>>>>>     -ea[:<packagename>...|:<classname>]
>>>>>>>>     -enableassertions[:<packagename>...|:<classname>]
>>>>>>>>                   enable assertions with specified granularity
>>>>>>>>     -da[:<packagename>...|:<classname>]
>>>>>>>>     -disableassertions[:<packagename>...|:<classname>]
>>>>>>>>                   disable assertions with specified granularity
>>>>>>>>     -esa | -enablesystemassertions
>>>>>>>>                   enable system assertions
>>>>>>>>     -dsa | -disablesystemassertions
>>>>>>>>                   disable system assertions
>>>>>>>>     -agentlib:<libname>[=<options>]
>>>>>>>>                   load native agent library <libname>,
e.g.
>>>>>>>> -agentlib:hprof
>>>>>>>>                   see also, -agentlib:jdwp=help and
>>>>>>>> -agentlib:hprof=help
>>>>>>>>     -agentpath:<pathname>[=<options>]
>>>>>>>>                   load native agent library by full pathname
>>>>>>>>     -javaagent:<jarpath>[=<options>]
>>>>>>>>                   load Java programming language agent, see
>>>>>>>> java.lang.instrument
>>>>>>>>     -splash:<imagepath>
>>>>>>>>                   show splash screen with specified image
>>>>>>>> See
>>>>>>>> http://www.oracle.com/technetwork/java/javase/documentation/index.html
>>>>>>>> for more details.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Jacky
>>>>>>>>
>>>>>>>> Robert Metzger <rmetzger@apache.org> 于2020年5月11日周一
下午3:42写道:
>>>>>>>>
>>>>>>>>> Hey Jacky,
>>>>>>>>>
>>>>>>>>> The error says "The YARN application unexpectedly switched
to
>>>>>>>>> state FAILED during deployment.".
>>>>>>>>> Have you tried retrieving the YARN application logs?
>>>>>>>>> Does the YARN UI / resource manager logs reveal anything
on the
>>>>>>>>> reason for the deployment to fail?
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Robert
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, May 11, 2020 at 9:34 PM Jacky D <jacky.du0314@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------- Forwarded message ---------
>>>>>>>>>> 发件人: Jacky D <jacky.du0314@gmail.com>
>>>>>>>>>> Date: 2020年5月11日周一 下午3:12
>>>>>>>>>> Subject: Re: Flink Memory analyze on AWS EMR
>>>>>>>>>> To: Khachatryan Roman <khachatryan.roman@gmail.com>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi, Roman
>>>>>>>>>>
>>>>>>>>>> Thanks for quick response , I tried without logFIle
option but
>>>>>>>>>> failed with same error , I'm currently using flink
1.6
>>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.6/monitoring/application_profiling.html,
>>>>>>>>>> so I can only use Jitwatch or JMC .  I guess those
tools only available on
>>>>>>>>>> Standalone cluster ? as document mentioned "Each
standalone
>>>>>>>>>> JobManager, TaskManager, HistoryServer, and ZooKeeper
daemon redirects
>>>>>>>>>> stdout and stderr to a file with a .out filename
suffix and
>>>>>>>>>> writes internal logging to a file with a .log suffix.
Java
>>>>>>>>>> options configured by the user in env.java.opts"
?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Jacky
>>>>>>>>>>
>>>>>>>>>
>>>
>>> --
>>>
>>> Arvid Heise | Senior Java Developer
>>>
>>> <https://www.ververica.com/>
>>>
>>> Follow us @VervericaData
>>>
>>> --
>>>
>>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>>> Conference
>>>
>>> Stream Processing | Event Driven | Real Time
>>>
>>> --
>>>
>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>>
>>> --
>>> Ververica GmbH
>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
>>> (Toni) Cheng
>>>
>>

Mime
View raw message