flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Could not build up connection to JobManager
Date Thu, 05 Mar 2015 12:20:24 GMT
Hi Dulaj!

Okay, the logs give us some insight. Both setups seem to look good in terms
of TaskManager and JobManager startup.

In one of the logs (127.0.0.1) you submit a job. The job fails because the
TaskManager cannot grab the JAR file from the JobManager.
I think the problem is that the BLOB server binds to 0.0.0.0 - it should
bind to the same address as the JobManager actor system.

That should definitely be changed...

On Thu, Mar 5, 2015 at 10:08 AM, Dulaj Viduranga <vidura.me@icloud.com>
wrote:

> Hi,
> This is the log with setting “localhost”
> flink-Vidura-jobmanager-localhost.log <
> https://gist.github.com/viduranga/e9d43521587697de3eb5#file-flink-vidura-jobmanager-localhost-log
> >
>
> And this is the log with setting “127.0.0.1”
> flink-Vidura-jobmanager-localhost.log <
> https://gist.github.com/viduranga/5af6b05f204e1f4b344f#file-flink-vidura-jobmanager-localhost-log
> >
>
> > On Mar 5, 2015, at 2:23 PM, Till Rohrmann <trohrmann@apache.org> wrote:
> >
> > What does the jobmanager log says? I think Stephan added some more
> logging
> > output which helps us to debug this problem.
> >
> > On Thu, Mar 5, 2015 at 9:36 AM, Dulaj Viduranga <vidura.me@icloud.com>
> > wrote:
> >
> >> Using start-locat.sh.
> >> I’m using the original config yaml. I also tried changing jobmanager
> >> address in config to “127.0.0.1 but no luck. With my changes it works
> ok.
> >> The conf file follows.
> >>
> >>
> >>
> ################################################################################
> >> #  Licensed to the Apache Software Foundation (ASF) under one
> >> #  or more contributor license agreements.  See the NOTICE file
> >> #  distributed with this work for additional information
> >> #  regarding copyright ownership.  The ASF licenses this file
> >> #  to you under the Apache License, Version 2.0 (the
> >> #  "License"); you may not use this file except in compliance
> >> #  with the License.  You may obtain a copy of the License at
> >> #
> >> #      http://www.apache.org/licenses/LICENSE-2.0
> >> #
> >> #  Unless required by applicable law or agreed to in writing, software
> >> #  distributed under the License is distributed on an "AS IS" BASIS,
> >> #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> >> #  See the License for the specific language governing permissions and
> >> # limitations under the License.
> >>
> >>
> ################################################################################
> >>
> >>
> >>
> >>
> #==============================================================================
> >> # Common
> >>
> >>
> #==============================================================================
> >>
> >> jobmanager.rpc.address: 127.0.0.1
> >>
> >> jobmanager.rpc.port: 6123
> >>
> >> jobmanager.heap.mb: 256
> >>
> >> taskmanager.heap.mb: 512
> >>
> >> taskmanager.numberOfTaskSlots: 1
> >>
> >> parallelization.degree.default: 1
> >>
> >>
> >>
> #==============================================================================
> >> # Web Frontend
> >>
> >>
> #==============================================================================
> >>
> >> # The port under which the web-based runtime monitor listens.
> >> # A value of -1 deactivates the web server.
> >>
> >> jobmanager.web.port: 8081
> >>
> >> # The port uder which the standalone web client
> >> # (for job upload and submit) listens.
> >>
> >> webclient.port: 8080
> >>
> >>
> >>
> #==============================================================================
> >> # Advanced
> >>
> >>
> #==============================================================================
> >>
> >> # The number of buffers for the network stack.
> >> #
> >> # taskmanager.network.numberOfBuffers: 2048
> >>
> >> # Directories for temporary files.
> >> #
> >> # Add a delimited list for multiple directories, using the system
> directory
> >> # delimiter (colon ':' on unix) or a comma, e.g.:
> >> #     /data1/tmp:/data2/tmp:/data3/tmp
> >> #
> >> # Note: Each directory entry is read from and written to by a different
> I/O
> >> # thread. You can include the same directory multiple times in order to
> >> create
> >> # multiple I/O threads against that directory. This is for example
> >> relevant for
> >> # high-throughput RAIDs.
> >> #
> >> # If not specified, the system-specific Java temporary directory
> >> (java.io.tmpdir
> >> # property) is taken.
> >> #
> >> # taskmanager.tmp.dirs: /tmp
> >>
> >> # Path to the Hadoop configuration directory.
> >> #
> >> # This configuration is used when writing into HDFS. Unless specified
> >> otherwise,
> >> # HDFS file creation will use HDFS default settings with respect to
> >> block-size,
> >> # replication factor, etc.
> >> #
> >> # You can also directly specify the paths to hdfs-default.xml and
> >> hdfs-site.xml
> >> # via keys 'fs.hdfs.hdfsdefault' and 'fs.hdfs.hdfssite'.
> >> #
> >> # fs.hdfs.hadoopconf: /path/to/hadoop/conf/
> >>
> >>
> >>> On Mar 5, 2015, at 2:03 PM, Till Rohrmann <trohrmann@apache.org>
> wrote:
> >>>
> >>> How did you start the flink cluster? Using the start-local.sh, the
> >>> start-cluster.sh or starting the job manager and task managers
> >> individually
> >>> using taskmanager.sh/jobmanager.sh. Could you maybe post the
> >>> flink-conf.yaml file, you're using?
> >>>
> >>> With your changes, everything works, right?
> >>>
> >>> On Thu, Mar 5, 2015 at 8:55 AM, Dulaj Viduranga <vidura.me@icloud.com>
> >>> wrote:
> >>>
> >>>> Hi Till,
> >>>> I’m sorry. It doesn’t seem to solve the problem. The taskmanager
still
> >>>> tries a 10.0.0.0/8 IP.
> >>>>
> >>>> Best regards.
> >>>>
> >>>>> On Mar 5, 2015, at 1:00 PM, Till Rohrmann <till.rohrmann@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Hi Dulaj,
> >>>>>
> >>>>> I looked through your commit and noticed that the JobClient might
not
> >> be
> >>>>> listening on the right network interface. Your commit seems to fix
> it.
> >> I
> >>>>> just want to understand the problem properly and therefore I opened
a
> >>>>> branch with a small change. Could you try out whether this change
> would
> >>>>> also fix your problem? You can find the code here [1]. Would be
> awesome
> >>>> if
> >>>>> you checked it out and let it run on your cluster setting. Thanks
a
> lot
> >>>>> Dulaj!
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>
> >>
> https://github.com/tillrohrmann/flink/tree/fixLocalFlinkMiniClusterJobClient
> >>>>>
> >>>>> On Thu, Mar 5, 2015 at 4:21 AM, Dulaj Viduranga <
> vidura.me@icloud.com>
> >>>>> wrote:
> >>>>>
> >>>>>> The every change in the commit b7da22a is not required but I
thought
> >>>> they
> >>>>>> are appropriate.
> >>>>>>
> >>>>>>> On Mar 5, 2015, at 8:11 AM, Dulaj Viduranga <vidura.me@icloud.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>> I found many other places “localhost” is hard coded.
I changed them
> >> in
> >>>> a
> >>>>>> better way I think. I made a pull request. Please review. b7da22a
<
> >>>>>>
> >>>>
> >>
> https://github.com/viduranga/flink/commit/b7da22a562d3da5a9be2657308c0f82e4e2f80cd
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Mar 4, 2015, at 8:17 PM, Stephan Ewen <sewen@apache.org>
> wrote:
> >>>>>>>>
> >>>>>>>> If I recall correctly, we only hardcode "localhost"
in the local
> >> mini
> >>>>>>>> cluster - do you think it is problematic there as well?
> >>>>>>>>
> >>>>>>>> Have you found any other places?
> >>>>>>>>
> >>>>>>>> On Mon, Mar 2, 2015 at 10:26 AM, Dulaj Viduranga <
> >>>> vidura.me@icloud.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> In some places of the code, "localhost" is hard
coded. When it is
> >>>>>> resolved
> >>>>>>>>> by the DNS, it is posible to be directed  to a different
IP other
> >>>> than
> >>>>>>>>> 127.0.0.1 (like private range 10.0.0.0/8). I changed
those
> places
> >> to
> >>>>>>>>> 127.0.0.1 and it works like a charm.
> >>>>>>>>> But hard coding 127.0.0.1 is not a good option because
when the
> >>>>>> jobmanager
> >>>>>>>>> ip is changed, this becomes an issue again. I'm
thinking of
> setting
> >>>>>>>>> jobmanager ip from the config.yaml to these places.
> >>>>>>>>> If you have a better idea on doing this with your
experience,
> >> please
> >>>>>> let
> >>>>>>>>> me know.
> >>>>>>>>>
> >>>>>>>>> Best.
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message