flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Running on a firewalled Yarn cluster?
Date Thu, 05 Nov 2015 14:06:33 GMT
Hi,

cool, that's good news.

The RM proxy is only for the web interface of the AM.

 I'm pretty sure that the MapReduce AM has at least two ports:
- one for the web interface (accessible through the RM proxy, so behind the
firewall)
- one for the AM RPC (and that port is allocated within the configured
range, open through the firewall).

You can probably find the RPC port in the log file of the running MapReduce
AM (to find that, identify the NodeManager running the AM, access the NM
web interface and retrieve the logs of the container running the AM).

Maybe the mapreduce client also logs the AM RPC port when querying the
status of a running job.


On Thu, Nov 5, 2015 at 2:59 PM, Niels Basjes <Niels@basjes.nl> wrote:

> Hi,
>
> I checked and this setting has been set to a limited port range of only
> 100 port numbers.
>
> I tried to find the actual port an AM is running on and couldn't find it
> (I'm not the admin on that cluster)
>
> The url to the AM that I use to access it always looks like this:
>
> http://master-001.xxxxxx.net:8088/proxy/application_1443166961758_85492/index.html
>
> As you can see I never connect directly; always via the proxy that runs
> over the master on a single fixed port.
>
> Niels
>
> On Thu, Nov 5, 2015 at 2:46 PM, Robert Metzger <rmetzger@apache.org>
> wrote:
>
>> While discussing with my colleagues about the issue today, we came up
>> with another approach to resolve the issue:
>>
>> d) Upload the job jar to HDFS (or another FS) and trigger the execution
>> of the jar using an HTTP request to the web interface.
>>
>> We could add some tooling into the /bin/flink client to submit a job like
>> this transparently, so users would not need to bother with the file upload
>> and request sending.
>> Also, Sachin started a discussion on the dev@ list to add support for
>> submitting jobs over the web interface, so maybe we can base the fix for
>> FLINK-2960 on that.
>>
>> I've also looked into the Hadoop MapReduce code and it seems they do the
>> following:
>> When submitting a job, they are uploading the job jar file to HDFS. They
>> also upload a configuration file that contains all the config options of
>> the job. Then, they submit this altogether as an application to YARN.
>> So far, there has not been any firewall involved. They establish a
>> connection between the JobClient and the ApplicationMaster when the user is
>> querying the current job status, but I could not find any special code
>> getting the status over HTTP.
>>
>> But I found the following configuration parameter:
>> "yarn.app.mapreduce.am.job.client.port-range", so it seems that they try to
>> allocate the AM port within that range (if specified).
>> Niels, can you check if this configuration parameter is set in your
>> environment? I assume your firewall allows outside connections from that
>> port range.
>> So we also have a new approach:
>>
>> f) Allocate the YARN application master (and blob manager) within a
>> user-specified port-range.
>>
>> This would be really easy to implement, because we would just need to go
>> through the range until we find an available port.
>>
>>
>> On Tue, Nov 3, 2015 at 1:06 PM, Niels Basjes <Niels@basjes.nl> wrote:
>>
>>> Great!
>>>
>>> I'll watch the issue and give it a test once I see a working patch.
>>>
>>> Niels Basjes
>>>
>>> On Tue, Nov 3, 2015 at 1:03 PM, Maximilian Michels <mxm@apache.org>
>>> wrote:
>>>
>>>> Hi Niels,
>>>>
>>>> Thanks a lot for reporting this issue. I think it is a very common
>>>> setup in corporate infrastructure to have restrictive firewall settings.
>>>> For Flink 1.0 (and probably in a minor 0.10.X release) we will have to
>>>> address this issue to ensure proper integration of Flink.
>>>>
>>>> I've created a JIRA to keep track:
>>>> https://issues.apache.org/jira/browse/FLINK-2960
>>>>
>>>> Best regards,
>>>> Max
>>>>
>>>> On Tue, Nov 3, 2015 at 11:02 AM, Niels Basjes <Niels@basjes.nl> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I forgot to answer your other question:
>>>>>
>>>>> On Mon, Nov 2, 2015 at 4:34 PM, Robert Metzger <rmetzger@apache.org>
>>>>> wrote:
>>>>>
>>>>>> so the problem is that you can not submit a job to Flink using the
>>>>>> "/bin/flink" tool, right?
>>>>>> I assume Flink and its TaskManagers properly start and connect to
>>>>>> each other (the number of TaskManagers is shown correctly in the
web
>>>>>> interface).
>>>>>>
>>>>>
>>>>> Correct. Flink starts (i see the jobmanager UI) but the actual job is
>>>>> not started.
>>>>>
>>>>> Niels Basjes
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards / Met vriendelijke groeten,
>>>
>>> Niels Basjes
>>>
>>
>>
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>

Mime
View raw message