flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Jackson <ajack...@pobox.com>
Subject Re: Connecting the channel failed: Connection refused
Date Wed, 24 Jun 2015 21:58:48 GMT
That was it.  host3 was showing localhost - looked a little further and it
was missing an entry in /etc/hosts.

Thanks for looking into this.

Aaron

On Wed, Jun 24, 2015 at 2:13 PM, Stephan Ewen <sewen@apache.org> wrote:

> Aaron,
>
> Can you check how the TaskManagers register at the JobManager? When you
> look at the 'TaskManagers' section in the JobManager's web Interface (at
> port 8081), what does it say as the TaskManager host names?
>
> Does it list "host1", "host2", "host3"...?
>
> Thanks,
> Stephan
>  Am 24.06.2015 20:31 schrieb "Ufuk Celebi" <uce@apache.org>:
>
>> On 24 Jun 2015, at 16:22, Aaron Jackson <ajackson@pobox.com> wrote:
>>
>> > Thanks.  My setup is actually 3 task managers x 4 slots.  I played with
>> the parallelism and found that at low values, the error did not occur.  I
>> can only conclude that there is some form of data shuffling that is
>> occurring that is sensitive to the data source.  Yes, seems a little odd to
>> me as well.  OOC, did you load the file into HDFS or use it from a local
>> file system (e.g. file:///tmp/data.csv) - my results have shown that so
>> far, HDFS does not appear to be sensitive to this issue.
>> >
>> > I updated the example to include my configuration and slaves, but for
>> brevity, I'll include the configurable bits here:
>> >
>> > jobmanager.rpc.address: host01
>> > jobmanager.rpc.port: 6123
>> > jobmanager.heap.mb: 512
>> > taskmanager.heap.mb: 2048
>> > taskmanager.numberOfTaskSlots: 4
>> > parallelization.degree.default: 1
>> > jobmanager.web.port: 8081
>> > webclient.port: 8080
>> > taskmanager.network.numberOfBuffers: 8192
>> > taskmanager.tmp.dirs: /datassd/flink/tmp
>> >
>> > And the slaves ...
>> >
>> > host01
>> > host02
>> > host03
>> >
>> > I did notice an extra empty line at the end of the slaves.  And while I
>> highly doubt it makes ANY difference, I'm still going to re-run with it
>> removed.
>> >
>> > Thanks for looking into it.
>>
>> Thank you for being so helpful. I've tried it with the local filesystem.
>>
>> On 23 Jun 2015, at 07:11, Aaron Jackson <ajackson@pobox.com> wrote:
>>
>> > I have 12 task managers across 3 machines - so it's a small setup.
>>
>> Sorry for my misunderstanding. I've tried it with both 12 task managers
>> and 3 as well now. What's odd is that the stack trace shows that it is
>> trying to connect to "localhost" for the remote channel although localhost
>> is not configured anywhere. Let me think about that. ;)
>>
>> – Ufuk
>>
>>
>>
>>
>>
>>

Mime
View raw message