Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4972710EA4 for ; Thu, 5 Mar 2015 12:20:49 +0000 (UTC) Received: (qmail 42090 invoked by uid 500); 5 Mar 2015 12:20:49 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 42031 invoked by uid 500); 5 Mar 2015 12:20:49 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 42019 invoked by uid 99); 5 Mar 2015 12:20:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Mar 2015 12:20:48 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ewenstephan@gmail.com designates 209.85.223.173 as permitted sender) Received: from [209.85.223.173] (HELO mail-ie0-f173.google.com) (209.85.223.173) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Mar 2015 12:20:45 +0000 Received: by iecrp18 with SMTP id rp18so7986061iec.7 for ; Thu, 05 Mar 2015 04:20:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-type; bh=+Wdbqg8XmG2tcnupTRRqwxHxxbYchW+j9/9r7UQc47k=; b=B9y436sYP6ytvfKmq04FCmtSvgo8kEQa8n2TfISP363VTHRZqHkHjLLzomzUdjnO+n KStIJlkos7Gv9UD1vRti/nDETOBheDu0uQOZqCkJTwBrVcWKJBWnTknw3kMCc6tPg/kQ VU6NiScIiY0imyxDvLdk7j+NHhjVtWMzevGxIXyTWSzmml5tVzkOKw0bm0HizXOy1nz2 hrOrhNnZWpyfLttUkGiC+wabg92AXoQ8f/YCyqgHQ+Tl1fAOQWRnDoY4t81K07FK1sJV Hcp0q2wA7Ogc9AyiOIBX0IBjImwvSzDcqu8M4FbDyFY7k7LIHpxE39/2lqmZauwG0RUW 9z8w== MIME-Version: 1.0 X-Received: by 10.107.11.140 with SMTP id 12mr19388342iol.5.1425558024271; Thu, 05 Mar 2015 04:20:24 -0800 (PST) Sender: ewenstephan@gmail.com Received: by 10.64.76.130 with HTTP; Thu, 5 Mar 2015 04:20:24 -0800 (PST) In-Reply-To: <6A9EAC02-3ED5-476E-BBFF-DD7AC3883461@icloud.com> References: <99df3a12-5bb2-4f9d-a6e2-d829cd7d37f0@me.com> <53EDE7DD-24FC-454F-B320-57C36E2C9960@icloud.com> <6A9EAC02-3ED5-476E-BBFF-DD7AC3883461@icloud.com> Date: Thu, 5 Mar 2015 13:20:24 +0100 X-Google-Sender-Auth: MiR92lVr3E2ghEAJJbCKB4vOVps Message-ID: Subject: Re: Could not build up connection to JobManager From: Stephan Ewen To: "dev@flink.apache.org" Content-Type: multipart/alternative; boundary=001a113f99b4a4be990510899714 X-Virus-Checked: Checked by ClamAV on apache.org --001a113f99b4a4be990510899714 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Dulaj! Okay, the logs give us some insight. Both setups seem to look good in terms of TaskManager and JobManager startup. In one of the logs (127.0.0.1) you submit a job. The job fails because the TaskManager cannot grab the JAR file from the JobManager. I think the problem is that the BLOB server binds to 0.0.0.0 - it should bind to the same address as the JobManager actor system. That should definitely be changed... On Thu, Mar 5, 2015 at 10:08 AM, Dulaj Viduranga wrote: > Hi, > This is the log with setting =E2=80=9Clocalhost=E2=80=9D > flink-Vidura-jobmanager-localhost.log < > https://gist.github.com/viduranga/e9d43521587697de3eb5#file-flink-vidura-= jobmanager-localhost-log > > > > And this is the log with setting =E2=80=9C127.0.0.1=E2=80=9D > flink-Vidura-jobmanager-localhost.log < > https://gist.github.com/viduranga/5af6b05f204e1f4b344f#file-flink-vidura-= jobmanager-localhost-log > > > > > On Mar 5, 2015, at 2:23 PM, Till Rohrmann wrote: > > > > What does the jobmanager log says? I think Stephan added some more > logging > > output which helps us to debug this problem. > > > > On Thu, Mar 5, 2015 at 9:36 AM, Dulaj Viduranga > > wrote: > > > >> Using start-locat.sh. > >> I=E2=80=99m using the original config yaml. I also tried changing jobm= anager > >> address in config to =E2=80=9C127.0.0.1 but no luck. With my changes i= t works > ok. > >> The conf file follows. > >> > >> > >> > #########################################################################= ####### > >> # Licensed to the Apache Software Foundation (ASF) under one > >> # or more contributor license agreements. See the NOTICE file > >> # distributed with this work for additional information > >> # regarding copyright ownership. The ASF licenses this file > >> # to you under the Apache License, Version 2.0 (the > >> # "License"); you may not use this file except in compliance > >> # with the License. You may obtain a copy of the License at > >> # > >> # http://www.apache.org/licenses/LICENSE-2.0 > >> # > >> # Unless required by applicable law or agreed to in writing, software > >> # distributed under the License is distributed on an "AS IS" BASIS, > >> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or > implied. > >> # See the License for the specific language governing permissions and > >> # limitations under the License. > >> > >> > #########################################################################= ####### > >> > >> > >> > >> > #=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > >> # Common > >> > >> > #=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > >> > >> jobmanager.rpc.address: 127.0.0.1 > >> > >> jobmanager.rpc.port: 6123 > >> > >> jobmanager.heap.mb: 256 > >> > >> taskmanager.heap.mb: 512 > >> > >> taskmanager.numberOfTaskSlots: 1 > >> > >> parallelization.degree.default: 1 > >> > >> > >> > #=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > >> # Web Frontend > >> > >> > #=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > >> > >> # The port under which the web-based runtime monitor listens. > >> # A value of -1 deactivates the web server. > >> > >> jobmanager.web.port: 8081 > >> > >> # The port uder which the standalone web client > >> # (for job upload and submit) listens. > >> > >> webclient.port: 8080 > >> > >> > >> > #=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > >> # Advanced > >> > >> > #=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > >> > >> # The number of buffers for the network stack. > >> # > >> # taskmanager.network.numberOfBuffers: 2048 > >> > >> # Directories for temporary files. > >> # > >> # Add a delimited list for multiple directories, using the system > directory > >> # delimiter (colon ':' on unix) or a comma, e.g.: > >> # /data1/tmp:/data2/tmp:/data3/tmp > >> # > >> # Note: Each directory entry is read from and written to by a differen= t > I/O > >> # thread. You can include the same directory multiple times in order t= o > >> create > >> # multiple I/O threads against that directory. This is for example > >> relevant for > >> # high-throughput RAIDs. > >> # > >> # If not specified, the system-specific Java temporary directory > >> (java.io.tmpdir > >> # property) is taken. > >> # > >> # taskmanager.tmp.dirs: /tmp > >> > >> # Path to the Hadoop configuration directory. > >> # > >> # This configuration is used when writing into HDFS. Unless specified > >> otherwise, > >> # HDFS file creation will use HDFS default settings with respect to > >> block-size, > >> # replication factor, etc. > >> # > >> # You can also directly specify the paths to hdfs-default.xml and > >> hdfs-site.xml > >> # via keys 'fs.hdfs.hdfsdefault' and 'fs.hdfs.hdfssite'. > >> # > >> # fs.hdfs.hadoopconf: /path/to/hadoop/conf/ > >> > >> > >>> On Mar 5, 2015, at 2:03 PM, Till Rohrmann > wrote: > >>> > >>> How did you start the flink cluster? Using the start-local.sh, the > >>> start-cluster.sh or starting the job manager and task managers > >> individually > >>> using taskmanager.sh/jobmanager.sh. Could you maybe post the > >>> flink-conf.yaml file, you're using? > >>> > >>> With your changes, everything works, right? > >>> > >>> On Thu, Mar 5, 2015 at 8:55 AM, Dulaj Viduranga > >>> wrote: > >>> > >>>> Hi Till, > >>>> I=E2=80=99m sorry. It doesn=E2=80=99t seem to solve the problem. The= taskmanager still > >>>> tries a 10.0.0.0/8 IP. > >>>> > >>>> Best regards. > >>>> > >>>>> On Mar 5, 2015, at 1:00 PM, Till Rohrmann > >>>> wrote: > >>>>> > >>>>> Hi Dulaj, > >>>>> > >>>>> I looked through your commit and noticed that the JobClient might n= ot > >> be > >>>>> listening on the right network interface. Your commit seems to fix > it. > >> I > >>>>> just want to understand the problem properly and therefore I opened= a > >>>>> branch with a small change. Could you try out whether this change > would > >>>>> also fix your problem? You can find the code here [1]. Would be > awesome > >>>> if > >>>>> you checked it out and let it run on your cluster setting. Thanks a > lot > >>>>> Dulaj! > >>>>> > >>>>> [1] > >>>>> > >>>> > >> > https://github.com/tillrohrmann/flink/tree/fixLocalFlinkMiniClusterJobCli= ent > >>>>> > >>>>> On Thu, Mar 5, 2015 at 4:21 AM, Dulaj Viduranga < > vidura.me@icloud.com> > >>>>> wrote: > >>>>> > >>>>>> The every change in the commit b7da22a is not required but I thoug= ht > >>>> they > >>>>>> are appropriate. > >>>>>> > >>>>>>> On Mar 5, 2015, at 8:11 AM, Dulaj Viduranga > >>>>>> wrote: > >>>>>>> > >>>>>>> Hi, > >>>>>>> I found many other places =E2=80=9Clocalhost=E2=80=9D is hard cod= ed. I changed them > >> in > >>>> a > >>>>>> better way I think. I made a pull request. Please review. b7da22a = < > >>>>>> > >>>> > >> > https://github.com/viduranga/flink/commit/b7da22a562d3da5a9be2657308c0f82= e4e2f80cd > >>>>>>> > >>>>>>> > >>>>>>>> On Mar 4, 2015, at 8:17 PM, Stephan Ewen > wrote: > >>>>>>>> > >>>>>>>> If I recall correctly, we only hardcode "localhost" in the local > >> mini > >>>>>>>> cluster - do you think it is problematic there as well? > >>>>>>>> > >>>>>>>> Have you found any other places? > >>>>>>>> > >>>>>>>> On Mon, Mar 2, 2015 at 10:26 AM, Dulaj Viduranga < > >>>> vidura.me@icloud.com> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> In some places of the code, "localhost" is hard coded. When it = is > >>>>>> resolved > >>>>>>>>> by the DNS, it is posible to be directed to a different IP oth= er > >>>> than > >>>>>>>>> 127.0.0.1 (like private range 10.0.0.0/8). I changed those > places > >> to > >>>>>>>>> 127.0.0.1 and it works like a charm. > >>>>>>>>> But hard coding 127.0.0.1 is not a good option because when the > >>>>>> jobmanager > >>>>>>>>> ip is changed, this becomes an issue again. I'm thinking of > setting > >>>>>>>>> jobmanager ip from the config.yaml to these places. > >>>>>>>>> If you have a better idea on doing this with your experience, > >> please > >>>>>> let > >>>>>>>>> me know. > >>>>>>>>> > >>>>>>>>> Best. > >>>>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>> > >>>> > >> > >> > > --001a113f99b4a4be990510899714--