flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chesnay Schepler <ches...@apache.org>
Subject Re: History Server Not Showing Any Jobs - File Not Found?
Date Thu, 28 May 2020 15:00:09 GMT
Looks like it is indeed stuck on downloading the archive.

I searched a bit in the Hadoop JIRA and found several similar instances:
https://issues.apache.org/jira/browse/HDFS-6999
https://issues.apache.org/jira/browse/HDFS-7005
https://issues.apache.org/jira/browse/HDFS-7145

It is supposed to be fixed in 2.6.0 though :/

If hadoop is available from the HADOOP_CLASSPATH and flink-shaded-hadoop 
in /lib then you basically don't know what Hadoop version is actually 
being used,
which could lead to incompatibilities and dependency clashes.
If flink-shaded-hadoop 2.4/2.5 is on the classpath, maybe that is being 
used and runs into HDFS-7005.

On 28/05/2020 16:27, Hailu, Andreas wrote:
>
> Just created a dump, here’s what I see:
>
> "Flink-HistoryServer-ArchiveFetcher-thread-1" #19 daemon prio=5 
> os_prio=0 tid=0x00007f93a5a2c000 nid=0x5692 runnable [0x00007f934a0d3000]
>
> java.lang.Thread.State: RUNNABLE
>
>         at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>
>         at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>
>         at 
> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
>
>         at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>
>         - locked <0x00000005df986960> (a sun.nio.ch.Util$2)
>
>         - locked <0x00000005df986948> (a 
> java.util.Collections$UnmodifiableSet)
>
>         - locked <0x00000005df928390> (a sun.nio.ch.EPollSelectorImpl)
>
>         at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>
>         at 
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
>
>         at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
>
>         at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258)
>
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209)
>
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171)
>
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
>
>         at 
> org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)
>
>         at 
> org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
>
>         - locked <0x00000005ceade5e0> (a 
> org.apache.hadoop.hdfs.RemoteBlockReader2)
>
>         at 
> org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:781)
>
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:837)
>
>         - eliminated <0x00000005cead3688> (a 
> org.apache.hadoop.hdfs.DFSInputStream)
>
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:897)
>
>         - locked <0x00000005cead3688> (a 
> org.apache.hadoop.hdfs.DFSInputStream)
>
>        at 
> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:945)
>
>         - locked <0x00000005cead3688> (a 
> org.apache.hadoop.hdfs.DFSInputStream)
>
>         at java.io.DataInputStream.read(DataInputStream.java:149)
>
>         at 
> org.apache.flink.runtime.fs.hdfs.HadoopDataInputStream.read(HadoopDataInputStream.java:94)
>
>         at java.io.InputStream.read(InputStream.java:101)
>
>         at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:69)
>
>         at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:91)
>
>         at 
> org.apache.flink.runtime.history.FsJobArchivist.getArchivedJsons(FsJobArchivist.java:110)
>
>         at 
> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:169)
>
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>
>         at 
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>
>         at java.lang.Thread.run(Thread.java:745)
>
> What problems could the flink-shaded-hadoop jar being included introduce?
>
> *// *ah**
>
> *From:*Chesnay Schepler <chesnay@apache.org>
> *Sent:* Thursday, May 28, 2020 9:26 AM
> *To:* Hailu, Andreas [Engineering] <Andreas.Hailu@ny.email.gs.com>; 
> user@flink.apache.org
> *Subject:* Re: History Server Not Showing Any Jobs - File Not Found?
>
> If it were a class-loading issue I would think that we'd see an 
> exception of some kind. Maybe double-check that flink-shaded-hadoop is 
> not in the lib directory. (usually I would ask for the full classpath 
> that the HS is started with, but as it turns out this isn't getting 
> logged :( (FLINK-18008))
>
> The fact that overview.json and jobs/overview.json are missing 
> indicates that something goes wrong directly on startup. What is 
> supposed to happens is that the HS starts, fetches all currently 
> available archives and then creates these files.
>
> So it seems like the download gets stuck for some reason.
>
> Can you use jstack to create a thread dump, and see what the 
> Flink-HistoryServer-ArchiveFetcher is doing?
>
> I will also file a JIRA for adding more logging statements, like when 
> fetching starts/stops.
>
> On 27/05/2020 20:57, Hailu, Andreas wrote:
>
>     Hi Chesney, apologies for not getting back to you sooner here. So
>     I did what you suggested - I downloaded a few files from my
>     jobmanager.archive.fs.dir HDFS directory to a locally available
>     directory named
>     /local/scratch/hailua_p2epdlsuat/historyserver/archived/. I then
>     changed my historyserver.archive.fs.dir to
>     file:///local/scratch/hailua_p2epdlsuat/historyserver/archived/
>     and that seemed to work. I’m able to see the history of the
>     applications I downloaded. So this points to a problem with
>     sourcing the history from HDFS.
>
>     Do you think this could be classpath related? This is what we use
>     for our HADOOP_CLASSPATH var:
>
>     //gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/lib/*:/gns/software/ep/da/dataproc/dataproc-prod/lakeRmProxy.jar:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/bin::/gns/mw/dbclient/postgres/jdbc/pg-jdbc-9.3.v01/postgresql-9.3-1100-jdbc4.jar/
>
>     //
>
>     You can see we have references to Hadoop mapred/yarn/hdfs libs in
>     there.
>
>     *// *ah
>
>     *From:*Chesnay Schepler <chesnay@apache.org>
>     <mailto:chesnay@apache.org>
>     *Sent:* Sunday, May 3, 2020 6:00 PM
>     *To:* Hailu, Andreas [Engineering] <Andreas.Hailu@ny.email.gs.com>
>     <mailto:Andreas.Hailu@ny.email.gs.com>; user@flink.apache.org
>     <mailto:user@flink.apache.org>
>     *Subject:* Re: History Server Not Showing Any Jobs - File Not Found?
>
>     yes, exactly; I want to rule out that (somehow) HDFS is the problem.
>
>     I couldn't reproduce the issue locally myself so far.
>
>     On 01/05/2020 22:31, Hailu, Andreas wrote:
>
>         Hi Chesnay, yes – they were created using Flink 1.9.1 as we’ve
>         only just started to archive them in the past couple weeks.
>         Could you clarify on how you want to try local filesystem
>         archives? As in changing jobmanager.archive.fs.dir and
>         historyserver.web.tmpdir to the same local directory?
>
>         *// *ah
>
>         *From:*Chesnay Schepler <chesnay@apache.org>
>         <mailto:chesnay@apache.org>
>         *Sent:* Wednesday, April 29, 2020 8:26 AM
>         *To:* Hailu, Andreas [Engineering]
>         <Andreas.Hailu@ny.email.gs.com>
>         <mailto:Andreas.Hailu@ny.email.gs.com>; user@flink.apache.org
>         <mailto:user@flink.apache.org>
>         *Subject:* Re: History Server Not Showing Any Jobs - File Not
>         Found?
>
>         hmm...let's see if I can reproduce the issue locally.
>
>         Are the archives from the same version the history server runs
>         on? (Which I supposed would be 1.9.1?)
>
>         Just for the sake of narrowing things down, it would also be
>         interesting to check if it works with the archives residing in
>         the local filesystem.
>
>         On 27/04/2020 18:35, Hailu, Andreas wrote:
>
>             bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/
>
>             total 8
>
>             drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43
>             flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9
>
>             drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22
>             flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76
>
>             There are just two directories in here. I don’t see cache
>             directories from my attempts today, which is interesting.
>             Looking a little deeper into them:
>
>             bash-4.1$ ls -lr
>             /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9
>
>             total 1756
>
>             drwxrwxr-x 2 p2epdlsuat p2epdlsuat 1789952 Apr 21 10:44 jobs
>
>             bash-4.1$ ls -lr
>             /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9/jobs
>
>             total 0
>
>             -rw-rw-r-- 1 p2epdlsuat p2epdlsuat 0 Apr 21 10:43
>             overview.json
>
>             There are indeed archives already in HDFS – I’ve included
>             some in my initial mail, but here they are again just for
>             reference:
>
>             -bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs
>
>             Found 44282 items
>
>             -rw-r----- 3 delp datalake_admin_dev      50569 2020-03-21
>             23:17
>             /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936
>
>             -rw-r----- 3 delp datalake_admin_dev      49578 2020-03-03
>             08:45
>             /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5
>
>             -rw-r----- 3 delp datalake_admin_dev      50842 2020-03-24
>             15:19
>             /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757
>
>             ...
>
>             *// *ah
>
>             *From:*Chesnay Schepler <chesnay@apache.org>
>             <mailto:chesnay@apache.org>
>             *Sent:* Monday, April 27, 2020 10:28 AM
>             *To:* Hailu, Andreas [Engineering]
>             <Andreas.Hailu@ny.email.gs.com>
>             <mailto:Andreas.Hailu@ny.email.gs.com>;
>             user@flink.apache.org <mailto:user@flink.apache.org>
>             *Subject:* Re: History Server Not Showing Any Jobs - File
>             Not Found?
>
>             If historyserver.web.tmpdir is not set then java.io.tmpdir
>             is used, so that should be fine.
>
>             What are the contents of
>             /local/scratch/flink_historyserver_tmpdir?
>
>             I assume there are already archives in HDFS?
>
>             On 27/04/2020 16:02, Hailu, Andreas wrote:
>
>                 My machine’s /tmp directory is not large enough to
>                 support the archived files, so I changed my
>                 java.io.tmpdir to be in some other location which is
>                 significantly larger. I hadn’t set anything for
>                 historyserver.web.tmpdir, so I suspect it was still
>                 pointing at /tmp. I just tried setting
>                 historyserver.web.tmpdir to the same location as my
>                 java.io.tmpdir location, but I’m afraid I’m still
>                 seeing the following issue:
>
>                 2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG
>                 HistoryServerStaticFileServerHandler - Unable to load
>                 requested file /overview.json from classloader
>
>                 2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG
>                 HistoryServerStaticFileServerHandler - Unable to load
>                 requested file /jobs/overview.json from classloader
>
>                 flink-conf.yaml for reference:
>
>                 jobmanager.archive.fs.dir:
>                 hdfs:///user/p2epda/lake/delp_qa/flink_hs/
>
>                 historyserver.archive.fs.dir:
>                 hdfs:///user/p2epda/lake/delp_qa/flink_hs/
>
>                 historyserver.web.tmpdir:
>                 /local/scratch/flink_historyserver_tmpdir/
>
>                 Did you have anything else in mind when you said
>                 pointing somewhere funny?
>
>                 *// *ah
>
>                 *From:*Chesnay Schepler <chesnay@apache.org>
>                 <mailto:chesnay@apache.org>
>                 *Sent:* Monday, April 27, 2020 5:56 AM
>                 *To:* Hailu, Andreas [Engineering]
>                 <Andreas.Hailu@ny.email.gs.com>
>                 <mailto:Andreas.Hailu@ny.email.gs.com>;
>                 user@flink.apache.org <mailto:user@flink.apache.org>
>                 *Subject:* Re: History Server Not Showing Any Jobs -
>                 File Not Found?
>
>                 overview.json is a generated file that is placed in
>                 the local directory controlled by
>                 /historyserver.web.tmpdir/.
>
>                 Have you configured this option to point to some
>                 non-local filesystem? (Or if not, is the
>                 java.io.tmpdir property pointing somewhere funny?)
>
>                 On 24/04/2020 18:24, Hailu, Andreas wrote:
>
>                     I’m having a further look at the code in
>                     HistoryServerStaticFileServerHandler - is there an
>                     assumption about where overview.json is supposed
>                     to be located?
>
>                     *// *ah
>
>                     *From:*Hailu, Andreas [Engineering]
>                     *Sent:* Wednesday, April 22, 2020 1:32 PM
>                     *To:* 'Chesnay Schepler' <chesnay@apache.org>
>                     <mailto:chesnay@apache.org>; Hailu, Andreas
>                     [Engineering] <Andreas.Hailu@ny.email.gs.com>
>                     <mailto:Andreas.Hailu@ny.email.gs.com>;
>                     user@flink.apache.org <mailto:user@flink.apache.org>
>                     *Subject:* RE: History Server Not Showing Any Jobs
>                     - File Not Found?
>
>                     Hi Chesnay, thanks for responding. We’re using
>                     Flink 1.9.1. I enabled DEBUG level logging and
>                     this is something relevant I see:
>
>                     2020-04-22 13:25:52,566
>                     [Flink-HistoryServer-ArchiveFetcher-thread-1]
>                     DEBUG DFSInputStream - Connecting to datanode
>                     10.79.252.101:1019
>
>                     2020-04-22 13:25:52,567
>                     [Flink-HistoryServer-ArchiveFetcher-thread-1]
>                     DEBUG SaslDataTransferClient - SASL encryption
>                     trust check: localHostTrusted = false,
>                     remoteHostTrusted = false
>
>                     2020-04-22 13:25:52,567
>                     [Flink-HistoryServer-ArchiveFetcher-thread-1]
>                     DEBUG SaslDataTransferClient - SASL client
>                     skipping handshake in secured configuration with
>                     privileged port for addr = /10.79.252.101,
>                     datanodeId = DatanodeI
>
>                     nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]
>
>                     *2020-04-22 13:25:52,571
>                     [Flink-HistoryServer-ArchiveFetcher-thread-1]
>                     DEBUG DFSInputStream - DFSInputStream has been
>                     closed already*
>
>                     *2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6]
>                     DEBUG HistoryServerStaticFileServerHandler -
>                     Unable to load requested file /jobs/overview.json
>                     from classloader*
>
>                     2020-04-22 13:25:52,576 [IPC Parameter Sending
>                     Thread #0] DEBUG Client$Connection$3 - IPC Client
>                     (1578587450) connection to
>                     d279536-002.dc.gs.com/10.59.61.87:8020 from
>                     delp@GS.COM <mailto:delp@GS.COM> sending #1391
>
>                     Aside from that, it looks like a lot of logging
>                     around datanodes and block location metadata. Did
>                     I miss something in my classpath, perhaps? If so,
>                     do you have a suggestion on what I could try?
>
>                     *// *ah
>
>                     *From:*Chesnay Schepler <chesnay@apache.org
>                     <mailto:chesnay@apache.org>>
>                     *Sent:* Wednesday, April 22, 2020 2:16 AM
>                     *To:* Hailu, Andreas [Engineering]
>                     <Andreas.Hailu@ny.email.gs.com
>                     <mailto:Andreas.Hailu@ny.email.gs.com>>;
>                     user@flink.apache.org <mailto:user@flink.apache.org>
>                     *Subject:* Re: History Server Not Showing Any Jobs
>                     - File Not Found?
>
>                     Which Flink version are you using?
>
>                     Have you checked the history server logs after
>                     enabling debug logging?
>
>                     On 21/04/2020 17:16, Hailu, Andreas [Engineering]
>                     wrote:
>
>                         Hi,
>
>                         I’m trying to set up the History Server, but
>                         none of my applications are showing up in the
>                         Web UI. Looking at the console, I see that all
>                         of the calls to /overview return the following
>                         404 response: {"errors":["File not found."]}.
>
>                         I’ve set up my configuration as follows:
>
>                         JobManager Archive directory:
>
>                         *jobmanager.archive.fs.dir*:
>                         hdfs:///user/p2epda/lake/delp_qa/flink_hs/
>
>                         -bash-4.1$ hdfs dfs -ls
>                         /user/p2epda/lake/delp_qa/flink_hs
>
>                         Found 44282 items
>
>                         -rw-r----- 3 delp datalake_admin_dev     
>                         50569 2020-03-21 23:17
>                         /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936
>
>                         -rw-r----- 3 delp datalake_admin_dev     
>                         49578 2020-03-03 08:45
>                         /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5
>
>                         -rw-r----- 3 delp datalake_admin_dev     
>                         50842 2020-03-24 15:19
>                         /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757
>
>                         ...
>
>                         ...
>
>                         History Server will fetch the archived jobs
>                         from the same location:
>
>                         *historyserver.archive.fs.dir*:
>                         hdfs:///user/p2epda/lake/delp_qa/flink_hs/
>
>                         So I’m able to confirm that there are indeed
>                         archived applications that I should be able to
>                         view in the histserver. I’m not able to find
>                         out what file the overview service is looking
>                         for from the repository – any suggestions as
>                         to what I could look into next?
>
>                         Best,
>
>                         Andreas
>
>                         ------------------------------------------------------------------------
>
>
>                         Your Personal Data: We may collect and process
>                         information about you that may be subject to
>                         data protection laws. For more information
>                         about how we use and disclose your personal
>                         data, how we protect your information, our
>                         legal basis to use your information, your
>                         rights and who you can contact, please refer
>                         to: www.gs.com/privacy-notices
>                         <http://www.gs.com/privacy-notices>
>
>                     ------------------------------------------------------------------------
>
>
>                     Your Personal Data: We may collect and process
>                     information about you that may be subject to data
>                     protection laws. For more information about how we
>                     use and disclose your personal data, how we
>                     protect your information, our legal basis to use
>                     your information, your rights and who you can
>                     contact, please refer to:
>                     www.gs.com/privacy-notices
>                     <http://www.gs.com/privacy-notices>
>
>                 ------------------------------------------------------------------------
>
>
>                 Your Personal Data: We may collect and process
>                 information about you that may be subject to data
>                 protection laws. For more information about how we use
>                 and disclose your personal data, how we protect your
>                 information, our legal basis to use your information,
>                 your rights and who you can contact, please refer to:
>                 www.gs.com/privacy-notices
>                 <http://www.gs.com/privacy-notices>
>
>             ------------------------------------------------------------------------
>
>
>             Your Personal Data: We may collect and process information
>             about you that may be subject to data protection laws. For
>             more information about how we use and disclose your
>             personal data, how we protect your information, our legal
>             basis to use your information, your rights and who you can
>             contact, please refer to: www.gs.com/privacy-notices
>             <http://www.gs.com/privacy-notices>
>
>         ------------------------------------------------------------------------
>
>
>         Your Personal Data: We may collect and process information
>         about you that may be subject to data protection laws. For
>         more information about how we use and disclose your personal
>         data, how we protect your information, our legal basis to use
>         your information, your rights and who you can contact, please
>         refer to: www.gs.com/privacy-notices
>         <http://www.gs.com/privacy-notices>
>
>     ------------------------------------------------------------------------
>
>
>     Your Personal Data: We may collect and process information about
>     you that may be subject to data protection laws. For more
>     information about how we use and disclose your personal data, how
>     we protect your information, our legal basis to use your
>     information, your rights and who you can contact, please refer to:
>     www.gs.com/privacy-notices <http://www.gs.com/privacy-notices>
>
>
> ------------------------------------------------------------------------
>
> Your Personal Data: We may collect and process information about you 
> that may be subject to data protection laws. For more information 
> about how we use and disclose your personal data, how we protect your 
> information, our legal basis to use your information, your rights and 
> who you can contact, please refer to: www.gs.com/privacy-notices 
> <http://www.gs.com/privacy-notices>



Mime
View raw message