flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Nowojski <pi...@data-artisans.com>
Subject Re: Weird performance on custom Hashjoin w.r.t. parallelism
Date Thu, 09 Nov 2017 14:39:07 GMT
Hi,

Yes as you correctly analysed parallelism 1 was causing problems, because it meant that all
of the records must been gathered over the network from all of the task managers. Keep in
mind that even if you increase parallelism to ā€œpā€, every change in parallelism can slow
down your application, because events will have to be redistributed, which in most cases means
network transfers. 

For measuring throughput you could use already defined metrics in Flink:
https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/metrics.html <https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/metrics.html>

You can get list of vertices of your job:
http://<web-ui-url>:8081/jobs/<job-id>/vertices <http://<web-ui-url>:8081/jobs/%3Cjob-id%3E/vertices>
Then statistics:
http://<web-ui-url>:8081/jobs/<job-id>/vertices/<vertex-id>/metrics <http://<web-ui-url>:8081/jobs/%3Cjob-id%3E/vertices/:vertex-id:/metrics>

For example
http://localhost:8081/jobs/34c6f7d00cf9b3ebfff4d94ad465eb23/vertices <http://localhost:8081/jobs/34c6f7d00cf9b3ebfff4d94ad465eb23/vertices>
http://localhost:8081/jobs/34c6f7d00cf9b3ebfff4d94ad465eb23/vertices/3d144c2a0fc19115f5f075ba85deac26/metrics
<http://localhost:8081/jobs/34c6f7d00cf9b3ebfff4d94ad465eb23/vertices/3d144c2a0fc19115f5f075ba85deac26/metrics>

You can also try to aggregate them:
https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/metrics.html#rest-api-integration
<https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/metrics.html#rest-api-integration>

Piotrek

> On 9 Nov 2017, at 07:53, m@xi <makisntpap@gmail.com> wrote:
> 
> Hello!
> 
> I found out that the cause of the problem was the map that I have after the
> parallel join with parallelism 1.
> When I changed it to .map(new MyMapMeter).setParallelism(p) then when I
> increase the number of parallelism p the completion time decreases, which is
> reasonable. Somehow it was a bottleneck of my parallel execution plan, but I
> had it this way in order to measure a valid average throughput.
> 
> So, my question is the following: 
> 
> How can I measure the average throughput of my parallel join operation
> properly?
> 
> Best,
> Max
> 
> 
> 
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Mime
View raw message