flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chesnay Schepler <ches...@apache.org>
Subject Re: REST: reading completed jobs' details
Date Thu, 06 Sep 2018 16:57:08 GMT
Did you by chance use the RemoteEnvironment and pass in 6123 as the 
port? If so, try using 8081 instead, which is the REST port.

On 06.09.2018 18:24, Miguel Coimbra wrote:
> Hello Chesnay,
>
> Thanks for the information.
>
> Decided to move straight away to launching a standalone cluster.
> I'm now having another problem when trying to submit a job through my 
> Java program after launching the standalone cluster.
>
> I configured the cluster (flink-1.6.0/conf/flink-conf.yaml) to use 2 
> TaskManager instances and assigned port ranges for most Flink cluster 
> entities (to avoid port collisions with more than 1 TaskManager):
>
> query.server.ports: 30000-35000
> query.proxy.ports: 35001-40000
> taskmanager.rpc.port: 45001-50000
> taskmanager.data.port: 50001-55000
> blob.server.port: 55001-60000
>
> I'm launching in Linux with:
>
> ./start-cluster.sh
>
> Starting cluster.
> Starting standalonesession daemon on host xxxxxxx.
> Starting taskexecutor daemon on host xxxxxxx.
> [INFO] 1 instance(s) of taskexecutor are already running on xxxxxxx.
> Starting taskexecutor daemon on host xxxxxxx.
>
>
> However, my Java program ends up hanging as soon as I perform 
> anexecute() call (for example by calling count() on a DataSet).
>
> Checking the JobManager log, I find the following exception whenever 
> my Java program calls execute() over the ExecutionEnvironment (either 
> using Maven on the terminal or from IntelliJ IDEA):
>
> WARN akka.remote.transport.netty.NettyTransport - Remote connection to 
> [/127.0.0.1:47774 <http://127.0.0.1:47774>] failed with 
> org.apache.flink.shaded.akka.org.jboss.netty.handler.codec.frame.TooLongFrameException:

> Adjusted frame length exceeds 10485760: 1347375960 - discarded
>
> I checked that the problem is happening on a count(), so I don't think 
> it has to do with the JobManager/TaskManagers trying to exchange 
> excessively-big messages.
>
> While searching, I tried to make sure my program compiles with the 
> same library versions as those in this cluster version of Flink.
>
>
> I downloaded the Apache Flink 1.6 binaries to launch the cluster:
>
>
> https://www.apache.org/dyn/closer.lua/flink/flink-1.6.0/flink-1.6.0-bin-scala_2.11.tgz
>
>
> I then checked the library versions used in the pom.xml of the 1.6.0 
> branch of the Flink repository:
>
>
> https://github.com/apache/flink/blob/release-1.6/pom.xml
>
> On my project's pom.xml, I have the following:
>
> <properties>
>     <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
>     <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
>     <maven.compiler.source>1.8</maven.compiler.source>
>     <maven.compiler.target>1.8</maven.compiler.target>
>     <flink.version>1.6.0</flink.version><slf4j.version>1.7.7</slf4j.version>
>     <log4j.version>1.2.17</log4j.version>
>     <scala.version>2.11.12</scala.version>
>     <scala.binary.version>2.11</scala.binary.version>
>     <akka.version>2.4.20</akka.version>
>     <junit.version>4.12</junit.version>
>     <junit.jupiter.version>5.0.0</junit.jupiter.version>
>     <junit.vintage.version>${junit.version}.1</junit.vintage.version>
>     <junit.platform.version>1.0.1</junit.platform.version>
>     <aspectj.version>1.9.1</aspectj.version>
> </properties>
>
> My project's dependency versions match those of the Flink 1.6 
> repository (for libraries such as akka).
> However, I'm having difficulty understanding what else may be causing 
> this problem.
>
> Thanks for your attention.
>
> Best,
>
> On Wed, 5 Sep 2018 at 20:18, Chesnay Schepler <chesnay@apache.org 
> <mailto:chesnay@apache.org>> wrote:
>
>     No, the cluster isn't shared. For each job a separate cluster is
>     spun up when calling execute(), at the end of which it is shut down.
>
>     For explicitly creation and shutdown of a cluster I would suggest
>     to execute your jobs as a test that contains a MiniClusterResource.
>
>     On 05.09.2018 20:59, Miguel Coimbra wrote:
>>     Thanks for the reply.
>>
>>     However, I think my case differs because I am running a sequence
>>     of independent Flink jobs on the same environment instance.
>>     I only create the LocalExecutionEnvironment once.
>>
>>     The web manager shows the job ID changing correctly every time a
>>     new job is executed.
>>
>>     Since it is the same execution environment (and therefore the
>>     same cluster instance I imagine), those completed jobs should
>>     show as well, no?
>>
>>     On Wed, 5 Sep 2018 at 18:40, Chesnay Schepler <chesnay@apache.org
>>     <mailto:chesnay@apache.org>> wrote:
>>
>>         When you create an environment that way, then the cluster is
>>         shutdown once the job completes.
>>         The WebUI can _appear_ as still working since all the files,
>>         and data about the job, is cached in the browser.
>>
>>         On 05.09.2018 17:39, Miguel Coimbra wrote:
>>>         Hello,
>>>
>>>         I'm having difficulty reading the status (such as time taken
>>>         for each dataflow operator in a job) of jobs that have
>>>         completed.
>>>
>>>         First, when I click on "Completed jobs" on the web interface
>>>         (by default at 8081), no job shows up.
>>>         I see jobs that exist as "Running", but as soon as they
>>>         finish, I would expect them to appear in the "Complete jobs"
>>>         section, but no luck.
>>>
>>>         Consider that I am running locally (web UI is running, I
>>>         checked and it is available via browser) on 8081.
>>>         None of these links worked for checking jobs that have
>>>         already finished, such as the job ID
>>>         618fac9da6ea458f5091a9c40e54cbcc that had been running:
>>>
>>>         http://127.0.0.1:8081/jobs/618fac9da6ea458f5091a9c40e54cbcc
>>>         http://127.0.0.1:8081/completed-jobs/618fac9da6ea458f5091a9c40e54cbcc
>>>
>>>         I'm running with a LocalExecutionEnvironment with with the
>>>         method:
>>>         ExecutionEnvironment.createLocalEnvironmentWithWebUI(conf)
>>>         I hope anyone may be able to help.
>>>
>>>         Best,
>>>
>>>
>>
>


Mime
View raw message