flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "flora karniav (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-8414) Gelly performance seriously decreases when using the suggested parallelism configuration
Date Sat, 13 Jan 2018 15:00:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16325157#comment-16325157

flora karniav commented on FLINK-8414:

Thank you for the information,

I understand the fact that lower parallelism levels are sufficient for these small datasets.
But why would performance decrease with larger parallelism values? Due to this fact, I cannot
measure performance using different datasets (with sizes that vary from MBs to GBs) with the
same Flink setup and configuration.

In addition, even if I know the Graph size a priori (using VertexMetrics), is there a formula
or some kind of standard way to decide the parallelism level accordingly? Or is brute force
the only way?

Thank you 

> Gelly performance seriously decreases when using the suggested parallelism configuration
> ----------------------------------------------------------------------------------------
>                 Key: FLINK-8414
>                 URL: https://issues.apache.org/jira/browse/FLINK-8414
>             Project: Flink
>          Issue Type: Bug
>          Components: Configuration, Documentation, Gelly
>            Reporter: flora karniav
>            Priority: Minor
> I am running Gelly examples with different datasets in a cluster of 5 machines (1 Jobmanager
and 4 Taskmanagers) of 32 cores each.
> The number of Slots parameter is set to 32 (as suggested) and the parallelism to 128
(32 cores*4 taskmanagers).
> I observe a vast performance degradation using these suggested settings than setting
parallelism.default to 16 for example were the same job completes at ~60 seconds vs ~140 in
the 128 parallelism case.
> Is there something wrong in my configuration? Should I decrease parallelism and -if so-
will this inevitably decrease CPU utilization?
> Another matter that may be related to this is the number of partitions of the data. Is
this somehow related to parallelism? How many partitions are created in the case of parallelism.default=128?

This message was sent by Atlassian JIRA

View raw message