flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Jackson <ajack...@pobox.com>
Subject Re: Connecting the channel failed: Connection refused
Date Thu, 25 Jun 2015 19:38:26 GMT
So the JobManager was running on host1.  This also explains why I didn't
see the problem until I had asked for a sizeable degree of parallelism
since it probably never assigned a task to host3.

Thanks for your help

On Thu, Jun 25, 2015 at 3:34 AM, Stephan Ewen <sewen@apache.org> wrote:

> Nice!
>
> TaskManagers need to announce where they listen for connections.
>
> We do not yet block "localhost" as an acceptable address, to not prohibit
> local test setups.
>
> There are some routines that try to select an interface that can
> communicate with the outside world.
>
> Is host3 running on the same machine as the JobManager? Or did you
> experience a long delay until TaskManager 3 was registered?
>
> Thanks for helping us debug this,
> Stephan
>
>
>
>
>
>
> On Wed, Jun 24, 2015 at 11:58 PM, Aaron Jackson <ajackson@pobox.com>
> wrote:
>
>> That was it.  host3 was showing localhost - looked a little further and
>> it was missing an entry in /etc/hosts.
>>
>> Thanks for looking into this.
>>
>> Aaron
>>
>> On Wed, Jun 24, 2015 at 2:13 PM, Stephan Ewen <sewen@apache.org> wrote:
>>
>>> Aaron,
>>>
>>> Can you check how the TaskManagers register at the JobManager? When you
>>> look at the 'TaskManagers' section in the JobManager's web Interface (at
>>> port 8081), what does it say as the TaskManager host names?
>>>
>>> Does it list "host1", "host2", "host3"...?
>>>
>>> Thanks,
>>> Stephan
>>>  Am 24.06.2015 20:31 schrieb "Ufuk Celebi" <uce@apache.org>:
>>>
>>>> On 24 Jun 2015, at 16:22, Aaron Jackson <ajackson@pobox.com> wrote:
>>>>
>>>> > Thanks.  My setup is actually 3 task managers x 4 slots.  I played
>>>> with the parallelism and found that at low values, the error did not
>>>> occur.  I can only conclude that there is some form of data shuffling that
>>>> is occurring that is sensitive to the data source.  Yes, seems a little odd
>>>> to me as well.  OOC, did you load the file into HDFS or use it from a local
>>>> file system (e.g. file:///tmp/data.csv) - my results have shown that so
>>>> far, HDFS does not appear to be sensitive to this issue.
>>>> >
>>>> > I updated the example to include my configuration and slaves, but for
>>>> brevity, I'll include the configurable bits here:
>>>> >
>>>> > jobmanager.rpc.address: host01
>>>> > jobmanager.rpc.port: 6123
>>>> > jobmanager.heap.mb: 512
>>>> > taskmanager.heap.mb: 2048
>>>> > taskmanager.numberOfTaskSlots: 4
>>>> > parallelization.degree.default: 1
>>>> > jobmanager.web.port: 8081
>>>> > webclient.port: 8080
>>>> > taskmanager.network.numberOfBuffers: 8192
>>>> > taskmanager.tmp.dirs: /datassd/flink/tmp
>>>> >
>>>> > And the slaves ...
>>>> >
>>>> > host01
>>>> > host02
>>>> > host03
>>>> >
>>>> > I did notice an extra empty line at the end of the slaves.  And while
>>>> I highly doubt it makes ANY difference, I'm still going to re-run with it
>>>> removed.
>>>> >
>>>> > Thanks for looking into it.
>>>>
>>>> Thank you for being so helpful. I've tried it with the local filesystem.
>>>>
>>>> On 23 Jun 2015, at 07:11, Aaron Jackson <ajackson@pobox.com> wrote:
>>>>
>>>> > I have 12 task managers across 3 machines - so it's a small setup.
>>>>
>>>> Sorry for my misunderstanding. I've tried it with both 12 task managers
>>>> and 3 as well now. What's odd is that the stack trace shows that it is
>>>> trying to connect to "localhost" for the remote channel although localhost
>>>> is not configured anywhere. Let me think about that. ;)
>>>>
>>>> – Ufuk
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>

Mime
View raw message