flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Running on a firewalled Yarn cluster?
Date Thu, 05 Nov 2015 13:46:26 GMT
While discussing with my colleagues about the issue today, we came up with
another approach to resolve the issue:

d) Upload the job jar to HDFS (or another FS) and trigger the execution of
the jar using an HTTP request to the web interface.

We could add some tooling into the /bin/flink client to submit a job like
this transparently, so users would not need to bother with the file upload
and request sending.
Also, Sachin started a discussion on the dev@ list to add support for
submitting jobs over the web interface, so maybe we can base the fix for
FLINK-2960 on that.

I've also looked into the Hadoop MapReduce code and it seems they do the
following:
When submitting a job, they are uploading the job jar file to HDFS. They
also upload a configuration file that contains all the config options of
the job. Then, they submit this altogether as an application to YARN.
So far, there has not been any firewall involved. They establish a
connection between the JobClient and the ApplicationMaster when the user is
querying the current job status, but I could not find any special code
getting the status over HTTP.

But I found the following configuration parameter:
"yarn.app.mapreduce.am.job.client.port-range", so it seems that they try to
allocate the AM port within that range (if specified).
Niels, can you check if this configuration parameter is set in your
environment? I assume your firewall allows outside connections from that
port range.
So we also have a new approach:

f) Allocate the YARN application master (and blob manager) within a
user-specified port-range.

This would be really easy to implement, because we would just need to go
through the range until we find an available port.


On Tue, Nov 3, 2015 at 1:06 PM, Niels Basjes <Niels@basjes.nl> wrote:

> Great!
>
> I'll watch the issue and give it a test once I see a working patch.
>
> Niels Basjes
>
> On Tue, Nov 3, 2015 at 1:03 PM, Maximilian Michels <mxm@apache.org> wrote:
>
>> Hi Niels,
>>
>> Thanks a lot for reporting this issue. I think it is a very common setup
>> in corporate infrastructure to have restrictive firewall settings. For
>> Flink 1.0 (and probably in a minor 0.10.X release) we will have to address
>> this issue to ensure proper integration of Flink.
>>
>> I've created a JIRA to keep track:
>> https://issues.apache.org/jira/browse/FLINK-2960
>>
>> Best regards,
>> Max
>>
>> On Tue, Nov 3, 2015 at 11:02 AM, Niels Basjes <Niels@basjes.nl> wrote:
>>
>>> Hi,
>>>
>>> I forgot to answer your other question:
>>>
>>> On Mon, Nov 2, 2015 at 4:34 PM, Robert Metzger <rmetzger@apache.org>
>>> wrote:
>>>
>>>> so the problem is that you can not submit a job to Flink using the
>>>> "/bin/flink" tool, right?
>>>> I assume Flink and its TaskManagers properly start and connect to each
>>>> other (the number of TaskManagers is shown correctly in the web interface).
>>>>
>>>
>>> Correct. Flink starts (i see the jobmanager UI) but the actual job is
>>> not started.
>>>
>>> Niels Basjes
>>>
>>
>>
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>

Mime
View raw message