flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Yarn configuration
Date Mon, 27 Jul 2015 17:16:49 GMT
Hi Michele,

I'm happy that you got it to run the way you want.

I guess services such as the HDFS NameNode and YARNs ResourceManager are
running on the master.
I don't know what you are doing on the cluster, but I suspect it is for
experimentation only. As long as you are not maintaining a huge HDFS
installation in the cluster, you don't need a fancy machine for the master.

The documentation [1] of EMR says:
"The master node does not have large computational requirements. For most
clusters of 50 or fewer nodes, consider using a m1.small for Hadoop 1
clusters and m1.large for Hadoop 2 clusters. For clusters of more than 50
nodes, consider using an m1.large for Hadoop 1 clusters and m1.xlarge for
Hadoop 2 clusters."

The m1.large machines [2] have 7.5 GB and 2 cores.
[1]
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-instances.html
[2] http://aws.amazon.com/ec2/previous-generation/


On Mon, Jul 27, 2015 at 5:19 PM, Michele Bertoni <
michele1.bertoni@mail.polimi.it> wrote:

>  OK thanks Robert you have been very clear now! :)
>
>  just one question, more related on emr than to flink, if i cannot run
> anything on the EMR master, then is it useful to allocate a big machine (8
> core, 30GB) on it? I thought it was the jm but it is not
>
>
>
>
>
>  Il giorno 27/lug/2015, alle ore 14:56, Robert Metzger <
> rmetzger@apache.org> ha scritto:
>
>  Hi Michele,
>
>
>  > no in an EMR configuration with 1 master and 5 core I have 5 active
> node in the resource manager…sounds strange to me: ganglia shows 6 nodes
> and 1 is always offload
>
>  Okay, so there are only 5 machines available to deploy containers to.
> The JobManager/ApplicationMaster will also occupy one container.
> I guess in EMR they are not running a NodeManager on the master node, so
> you can not deploy anything there via YARN.
>
>  > now i am a little lost because I thought I was running 5 node for 5 tm
> and the 6th (master one) as jm but it seems like I have to use the 5 core
> as both tm and jm
>
>  Flink on YARN can only deploy containers on machines which have a YARN
> NodeManager running. The JM runs on such a container.
>
>  > btw which is a good parameter for number of buffer?
>
>  see here for some explanation what they are used for:
> http://www.slideshare.net/robertmetzger1/apache-flink-hands-on/37
>  I would double them until your job runs (as a first approach ;) )
>
>  > I have been able to run 5 tm with -jm 2048 and -tm 20992 and 8 slots
> each but in flink dashboard it says “Flink Managed Memory 10506mb” with an
> exclamation mark saying it is much smaller than the physical memory
> (30105mb)…that’s true but i cannot run the cluster with more than 20992
>
>  I answered that question two weeks ago on this list (in the example for
> 10GB of memory):
>
>
>  Regarding the memory you are able to use in the end:
>> Initially, you request 10240MB.
>> From that, we add a 25% safety margin to avoid that YARN is going to
>> kill the JVM.
>> 10240*0.75 = 7680 MB.
>> So Flink's TaskManager will see 7680 MB when starting up.
>> Flink's Memory manager is only using 70% of the available heap space for
>> managed memory:
>> 7680*0.7 = 5376 MB.
>> The safety margin for YARN is very conservative. As Till already said,
>> you can set a different value for the "yarn.heap-cutoff-ratio" (try
>> 0.15) and see if your job still runs.
>
>
>
>
>
>
> On Mon, Jul 27, 2015 at 11:29 AM, Michele Bertoni <
> michele1.bertoni@mail.polimi.it> wrote:
>
>> Hi Fabian, thanks for your reply
>> so you flink is using about 50% of memory for itself right?
>>
>>  anyway now I am running an EMR with 1 master and 5 core all of them are
>> m3.2xlarge with 8 cores and 30GB of memory
>>
>>  I would like to run flink on yarn with 40 slots on 5 tm with the
>> maximum available resources, what i do is
>>
>>  change in conf-yaml.xml numberofSlots to 8 and default parallelism to 40
>> run yarn with the command
>> ./yarn-session.sh -n 5 -jm 2048 -tm 23040 (23040 is the maximum allowed
>> out of 30GB I don’t know why)
>>
>>  I get an error something like "failed allocating memory after 4/5
>> container available memory 20992"
>> I suspect that it is not using the master of the cluster for allocating
>> the jm but using one of the core right? in fact 20992 is exactly 23040-2048
>>
>>  then i run it with 20992
>> ./yarn-session.sh -n 5 -jm 2048 -tm 20992
>> it succeeds in running 5tm with 40 slots, but when I run a program I
>> always get
>>
>>   Caused by: java.io.IOException: Insufficient number of network
>> buffers: required 40, but only 14 available. The total number of network
>> buffers is currently set to 4096. You can increase this number by setting
>> the configuration key 'taskmanager.network.numberOfBuffers’.
>>
>>  I change the buffers number as robert said from 2048 to 4096 on of my
>> programs run but the second still has same problems
>>
>>
>>  Thanks for help
>> Best,
>> michele
>>
>>
>>  Il giorno 27/lug/2015, alle ore 11:19, Fabian Hueske <fhueske@gmail.com>
>> ha scritto:
>>
>>   Hi Michele,
>>
>>  the 10506 MB refer to the size of Flink's managed memory whereas the
>> 20992 MB refer to the total amount of TM memory. At start-up, the TM
>> allocates a fraction of the JVM memory as byte arrays and manages this
>> portion by itself. The remaining memory is used as regular JVM heap for TM
>> and user code.
>>
>>  The purpose of the warning is to tell the user, that the memory
>> configuration might not be optimal. However, this depends of course on the
>> setup environment and should probably be rephrased to make this more clear.
>>
>>  Cheers, Fabian
>>
>> 2015-07-27 11:07 GMT+02:00 Michele Bertoni <
>> michele1.bertoni@mail.polimi.it>:
>>
>>> I have been able to run 5 tm with -jm 2048 and -tm 20992 and 8 slots
>>> each but in flink dashboard it says “Flink Managed Memory 10506mb” with an
>>> exclamation mark saying it is much smaller than the physical memory
>>> (30105mb)…that’s true but i cannot run the cluster with more than 20992
>>>
>>>  thanks
>>>
>>>
>>>
>>>  Il giorno 27/lug/2015, alle ore 11:02, Michele Bertoni <
>>> michele1.bertoni@mail.polimi.it> ha scritto:
>>>
>>>  Hi Robert,
>>> thanks for answering, today I have been able to try again: no in an EMR
>>> configuration with 1 master and 5 core I have 5 active node in the resource
>>> manager…sounds strange to me: ganglia shows 6 nodes and 1 is always offload
>>>
>>>  the total amount of memory is 112.5GB that is actually 22.5 for each
>>> of the 5
>>>
>>>  now i am a little lost because I thought I was running 5 node for 5 tm
>>> and the 6th (master one) as jm but it seems like I have to use the 5 core
>>> as both tm and jm
>>>
>>>
>>>
>>>  btw which is a good parameter for number of buffer?
>>>
>>>
>>>  thanks,
>>> Best
>>> michele
>>>
>>>
>>>  Il giorno 24/lug/2015, alle ore 16:38, Robert Metzger <
>>> rmetzger@apache.org> ha scritto:
>>>
>>>  Hi Michele,
>>>
>>>  configuring a YARN cluster to allocate all available resources as good
>>> as possible is sometimes tricky, that is true.
>>> We are aware of these problems and there are actually the following two
>>> JIRAs for this:
>>> https://issues.apache.org/jira/browse/FLINK-937 (Change the YARN Client
>>> to allocate all cluster resources, if no argument given) --> I think the
>>> consensus on the issue was give users an option to allocate everything (so
>>> don't do it by default)
>>> https://issues.apache.org/jira/browse/FLINK-1288 (YARN
>>> ApplicationMaster sometimes fails to allocate the specified number of
>>> workers)
>>>
>>>  How many NodeManager's is YARN reporting in the ResourceManager UI?
>>> (in "Active Nodes" column) (I suspect 6?)
>>> How much memory per NodeManager is YARN reporting? (You can see this in
>>> the "Nodes" page of the RM)
>>>
>>>  > I would like to run 5 nodes with 8 slots each, is it correct?
>>>
>>>  Yes.
>>>
>>>
>>>  > Then i reduced memories, everything started but i get a runtime
>>> error of missing buffer
>>>
>>>  What exactly is the exception?
>>> I guess you have to give the system a few more network buffers using the
>>> taskmanager.network.numberOfBuffers config parameter.
>>>
>>>  > Can someone help me syep-by-step in a good configuration for such
>>> cluster? I think the documentation is really missing details
>>>
>>>  When starting Flink on YARN, there are usually some WARN log messages
>>> in the beginning when the system detects that specified containers will not
>>> fit in the cluster.
>>> Also, in the ResourceManager UI, you can see the status of the
>>> scheduler. This often helps to understand what's going on, resource-wise.
>>>
>>>
>>>
>>> On Fri, Jul 24, 2015 at 3:58 PM, Michele Bertoni <
>>> michele1.bertoni@mail.polimi.it> wrote:
>>>
>>>> Hi everybody, i need a help on how to configure a yarn cluster
>>>> I tried a lot of conf but none of them was correct
>>>>
>>>> We have a cluster on amazon emr let's say 1manager+5worker all of them
>>>> are m3.2xlarge then 8 core each and 30 GB of RAM each
>>>>
>>>> What is a good configuration for such cluster?
>>>>
>>>> I would like to run 5 nodes with 8 slots each, is it correct?
>>>>
>>>> Now the problems: by now i run all tests mistakenly using 40 task
>>>> managers each with 2048MB and 1 slot (at least it was working)
>>>>
>>>> Today i found the error and i tried run 5 task manager and setting a
>>>> default slot in conf-yaml of 8, giving a task manager memory of 23040 (-tm
>>>> 23040) that is the limit allowed by yarn but i am getting errors: one TM
is
>>>> not running because there is no available memory. it seems like the jm is
>>>> not using memory from the master but from the nodes (in fact yarn says TM
>>>> number 5 is missing 2048 that is the memory for the jm)
>>>>
>>>> Then i reduced memories, everything started but i get a runtime error
>>>> of missing buffer
>>>>
>>>> Can someone help me syep-by-step in a good configuration for such
>>>> cluster? I think the documentation is really missing details
>>>>
>>>> Thanks a lot
>>>> Best
>>>> Michele
>>>>
>>>
>>>
>>>
>>>
>>
>>
>
>

Mime
View raw message