flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Hogan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-8414) Gelly performance seriously decreases when using the suggested parallelism configuration
Date Fri, 12 Jan 2018 20:50:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16324542#comment-16324542

Greg Hogan commented on FLINK-8414:

It is incumbent on the user to configure an appropriate parallelism for the quantity of data.
Those graphs contain only a few tens of megabytes of data so it is not surprising that the
optimal parallelism is around (or even lower than) 16. You can use `VertexMetrics` to pre-compute
the size of the graph and adjust the parallelism at runtime (`ExecutionConfig#setParallelism`).
Flink and Gelly are designed to scale to 100s to 1000s of parallel tasks and GBs to TBs of

> Gelly performance seriously decreases when using the suggested parallelism configuration
> ----------------------------------------------------------------------------------------
>                 Key: FLINK-8414
>                 URL: https://issues.apache.org/jira/browse/FLINK-8414
>             Project: Flink
>          Issue Type: Bug
>          Components: Configuration, Documentation, Gelly
>            Reporter: flora karniav
>            Priority: Minor
> I am running Gelly examples with different datasets in a cluster of 5 machines (1 Jobmanager
and 4 Taskmanagers) of 32 cores each.
> The number of Slots parameter is set to 32 (as suggested) and the parallelism to 128
(32 cores*4 taskmanagers).
> I observe a vast performance degradation using these suggested settings than setting
parallelism.default to 16 for example were the same job completes at ~60 seconds vs ~140 in
the 128 parallelism case.
> Is there something wrong in my configuration? Should I decrease parallelism and -if so-
will this inevitably decrease CPU utilization?
> Another matter that may be related to this is the number of partitions of the data. Is
this somehow related to parallelism? How many partitions are created in the case of parallelism.default=128?

This message was sent by Atlassian JIRA

View raw message