flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maximilian Michels <...@apache.org>
Subject Re: [jira] [Commented] (FLINK-1679) Document how "degree of parallelism" / "parallelism" / "slots" are connected to each other
Date Thu, 12 Mar 2015 10:19:32 GMT
+1 for unifying the way to set the parallelism and deprecating the old methods.

We had the AUTOMAX discussion before in the corresponding pull
request. It seems to be that there are two orthogonal views on how
resources should be allocated by default. I strongly agree with
Robert.

Users have exclusive access to resources or use a resource manager
(YARN). They are often unaware of the parallelism and are turned off
by the bad performance with parallelism of 1. Setting AUTOMAX by
default gives the best possible Flink experience. After all, Flink
doesn't even support proper sharing of resources at the moment. So
scenarios where multiple users manually set the parallelism will cause
problems with job canceling due to unavailable resources and missing
queuing features.

Let's leave it up to the advanced users to set the granularity of the
parallelism and provide the best out of the box experience for Flink
novices.

Best regards,
Max

On Thu, Mar 12, 2015 at 10:31 AM, Robert Metzger <rmetzger@apache.org> wrote:
> We can also make the change non-API breaking by adding an additional method
> and deprecating the old one.
>
>
> Why would the AUTOMAX parallelism eat up all cluster resources? It would
> only allocate all slots WITHIN the Flink cluster.
> Those users (=new users) who would benefit from the AUTOMAX parallelism
> have probably set the parallelism per TaskManager set to 1 anyways.
> Advanced users will set their parallelism / slots configuration anyways
> properly.
>
> In my experience, most users:
> - have exclusive access to a test cluster in the beginning (I don't think
> anybody who doesn't know the system at all would start Flink on a
> production cluster)
> - or use YARN
> - do not set any parallelism for jobs or slots per TaskManager.
>
> From these observations, I would actually set the number of slots on the
> TaskManagers to the number of available CPUs.
> And for the CLI frontend, I would by default let a job use all available
> slots (most users don't know that Flink allows to run multiple jobs at the
> same time).
>
> If users want to change the behavior, they have to look into the
> documentation.
>
> On Thu, Mar 12, 2015 at 10:20 AM, Fabian Hueske <fhueske@gmail.com> wrote:
>
>> +1 for going consistently with parallelism. However, these are API-breaking
>> changes and we need to mark them deprecated before throwing them out, IMO.
>>
>> I am not comfortable with using AUTOMAX as a default. This is fine on
>> dedicated setups like YARN sessions, but will consume all available
>> resources of a cluster if a user forgets to set the -p flag (or fix the DOP
>> in the program). There is already a default-parallelsm flag in the config
>> and that value should be used, IMO.
>>
>> 2015-03-12 10:07 GMT+01:00 Robert Metzger (JIRA) <jira@apache.org>:
>>
>> >
>> >     [
>> >
>> https://issues.apache.org/jira/browse/FLINK-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358345#comment-14358345
>> > ]
>> >
>> > Robert Metzger commented on FLINK-1679:
>> > ---------------------------------------
>> >
>> > I would suggest to remove all occurrences of "degreeOfParalleism" in the
>> > system and replace it by "parallelism" everywhere.
>> > The CLI frontend for example also calls it {{-p}}, not {{-dop}}.
>> >
>> > I would also suggest to set the parallelism by default to {{AUTOMAX}} in
>> > the CliFrontend.
>> >
>> > > Document how "degree of parallelism" /  "parallelism" / "slots" are
>> > connected to each other
>> > >
>> >
>> -------------------------------------------------------------------------------------------
>> > >
>> > >                 Key: FLINK-1679
>> > >                 URL: https://issues.apache.org/jira/browse/FLINK-1679
>> > >             Project: Flink
>> > >          Issue Type: Task
>> > >          Components: Documentation
>> > >    Affects Versions: 0.9
>> > >            Reporter: Robert Metzger
>> > >            Assignee: Ufuk Celebi
>> > >
>> > > I see too many users being confused about properly setting up Flink
>> with
>> > respect to parallelism.
>> >
>> >
>> >
>> > --
>> > This message was sent by Atlassian JIRA
>> > (v6.3.4#6332)
>> >
>>

Mime
View raw message