flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From amir bahmanyari <amirto...@yahoo.com>
Subject Re: Fw: Flink Cluster Load Distribution Question
Date Wed, 14 Sep 2016 17:49:34 GMT
Hi Aljoscha,Thanks for your response. Its the same job but I am reading through TextIO() instead
of a Kafka topic.I thought that would make a difference. It doesnt. Same slowness in Flink
Cluster.I had sent you the code with reading from KafkaIO().Nothing different except commenting
out the KafkaIO() & un-commenting TextIO().Its attached along with the Support class.Is
there anything interesting you see in my configuration that may cause slowness and/or lack
of the right distribution in the cluster as a whole?I also attached my config files in the
JM node...same for other nodes.Have a wonderful day & thanks for your attention.Amir-

      From: Aljoscha Krettek <aljoscha@apache.org>
 To: user@flink.apache.org; amir bahmanyari <amirtousa@yahoo.com> 
 Sent: Wednesday, September 14, 2016 1:48 AM
 Subject: Re: Fw: Flink Cluster Load Distribution Question
Hi,this is a different job from the Kafka Job that you have running, right?
Could you maybe post the code for that as well?
On Tue, 13 Sep 2016 at 20:14 amir bahmanyari <amirtousa@yahoo.com> wrote:

Hi Robert,Sure, I am forwarding it to user. Sorry about that. I followed the "robot's" instructions
:))Topology: 4 Azure A11 CentOS 7 nodes (16 cores, 110 GB). Lets call them node1, 2, 3, 4.Flink
Clustered with node1 running JM & a TM. Three more TM's running on node2,3, and 4 respectively.I
have a Beam running FLink Runner underneath.The input data is received by Beam TextIO() reading
off a 1.6 GB of data containing roughly 22 million tuples.All nodes have identical flink-conf.yaml,
masters & slaves contents as follows:
        jobmanager.rpc.address: node1  jobmanager.rpc.port: 6123 jobmanager.heap.mb:
1024 taskmanager.heap.mb: 102400 taskmanager.numberOfTaskSlots: 16  taskmanager.memory.preallocate:
false parallelism.default: 64 jobmanager.web.port: 8081 taskmanager.network.numberOfBuffers:

    masters: node1:8081

Everything looks normal at ./start-cluster.sh & all daemons start on all nodes.JM, TMs
log files get generated on all nodes.Dashboard shows how all slots are being used.I deploy
the Beam app to the cluster where JM is running at node1.a *.out file gets generated as data
is being processed. No *.out on other nodes, just node1 where I deployed the fat jar.I tail
-f the *.out log on node1 (master). starts fine...but slowly degrades & becomes extremely
slow.As we speak, I started the Beam app 13 hrs ago and its still running.How can I prove
that ALL NODES are involved in processing the data at the same time i.e. clustered?Do the
above configurations look ok for a reasonable performance?Given above parameters set, how
can I improve the performance in this cluster?What other information and or dashboard screen
shots is needed to clarify this issue. I used these websites to do the configuration:Apache
Flink: Cluster Setup

|   |  
Apache Flink: Cluster Setup
   |  |



Apache Flink: Configuration

|   |  
Apache Flink: Configuration
   |  |


In the second link, there is a config recommendation for the following but this parameter
is not in the configuration file out of the box:   
   - taskmanager.network.bufferSizeInBytes
Should I include it manually? Does it make any difference if the default value i.e.32 KB doesn't
get picked up?Sorry too many questions.Pls let me know.I appreciate your help.Cheers,Amir-
----- Forwarded Message -----
 From: Robert Metzger <rmetzger@apache.org>
 To: "dev@flink.apache.org" <dev@flink.apache.org>; amir bahmanyari <amirtousa@yahoo.com>

 Sent: Tuesday, September 13, 2016 1:15 AM
 Subject: Re: Flink Cluster Load Distribution Question
Hi Amir,

I would recommend to post such questions to the user@flink mailing list in
the future. This list is meant for development-related topics.

I think we need more details to understand why your application is not
running properly. Can you quickly describe what your topology is doing?
Are you setting the parallelism to a value >= 1 ?


On Tue, Sep 13, 2016 at 6:35 AM, amir bahmanyari <
amirtousa@yahoo.com.invalid> wrote:

> Hi Colleagues,Just joined this forum.I have done everything possible to
> get a 4 nodes Flink cluster to work peoperly & run a Beam app.It always
> generates system-output logs (*.out) in only one node. Its sooooooooo slow
> for 4 nodes being there.Seems like the load is not distributed amongst all
> 4 nodes but only one node. Most of the time the one where JM runs.I
> run/tested it in a single node, and it took even faster to run the same
> load.Not sure whats not being configured right.1- why am I getting
> SystemOut .out log in only one server? All nodes get their TaskManager log
> files updated thu.2- why dont I see load being distributed amongst all 4
> nodes, but only one all the times.3- Why does the Dashboard show a 0 (zero)
> for Send/Receive numbers per all Task Managers.
> The Dashboard shows all the right stuff. Top shows not much of resources
> being stressed on any of the nodes.I can share its contents if it helps
> diagnosing the issue.Thanks + I appreciate your valuable time, response &
> help.Amir-


View raw message