flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Wysakowicz <dwysakow...@apache.org>
Subject Re: [DISCUSS] Hierarchies in ConfigOption
Date Wed, 29 Apr 2020 15:55:59 GMT
Hi all,

I also wanted to share my opinion.

When talking about a ConfigOption hierarchy we use for configuring Flink
cluster I would be a strong advocate for keeping a yaml/hocon/json/...
compatible style. Those options are primarily read from a file and thus
should at least try to follow common practices for nested formats if we
ever decide to switch to one.

Here the question is about the properties we use in SQL statements. The
origin/destination of these usually will be external catalog, usually in
a flattened(key/value) representation so I agree it is not as important
as in the aforementioned case. Nevertheless having a yaml based catalog
or being able to have e.g. yaml based snapshots of a catalog in my
opinion is appealing. At the same time cost of being able to have a nice
yaml/hocon/json representation is just adding a single suffix to a
single(at most 2 key + value) property. The question is between `format`
= `json` vs `format.kind` = `json`. That said I'd be slighty in favor of
doing it.

Just to have a full picture. Both cases can be represented in yaml, but
the difference is significant:

format: 'json'
format.option: 'value'

vs

format:
    kind: 'json'

    option: 'value'


Best,
Dawid

On 29/04/2020 17:13, Flavio Pompermaier wrote:
> Personally I don't have any preference here.  Compliance wih standard
> YAML parser is probably more important
>
> On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <imjark@gmail.com
> <mailto:imjark@gmail.com>> wrote:
>
>     From a user's perspective, I prefer the shorter one "format=json",
>     because
>     it's more concise and straightforward. The "kind" is redundant for
>     users.
>     Is there a real case requires to represent the configuration in
>     JSON style?
>     As far as I can see, I don't see such requirement, and everything
>     works
>     fine by now.
>
>     So I'm in favor of "format=json". But if the community insist to
>     follow
>     code style on this, I'm also fine with the longer one.
>
>     Btw, I also CC user mailing list to listen more user's feedback.
>     Because I
>     think this is relative to usability.
>
>     Best,
>     Jark
>
>     On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <chesnay@apache.org
>     <mailto:chesnay@apache.org>> wrote:
>
>     >  > Therefore, should we advocate instead:
>     >  >
>     >  > 'format.kind' = 'json',
>     >  > 'format.fail-on-missing-field' = 'false'
>     >
>     > Yes. That's pretty much it.
>     >
>     > This is reasonable important to nail down as with such violations I
>     > believe we could not actually switch to a standard YAML parser.
>     >
>     > On 29/04/2020 16:05, Timo Walther wrote:
>     > > Hi everyone,
>     > >
>     > > discussions around ConfigOption seem to be very popular
>     recently. So I
>     > > would also like to get some opinions on a different topic.
>     > >
>     > > How do we represent hierarchies in ConfigOption? In FLIP-122, we
>     > > agreed on the following DDL syntax:
>     > >
>     > > CREATE TABLE fs_table (
>     > >  ...
>     > > ) WITH (
>     > >  'connector' = 'filesystem',
>     > >  'path' = 'file:///path/to/whatever',
>     > >  'format' = 'csv',
>     > >  'format.allow-comments' = 'true',
>     > >  'format.ignore-parse-errors' = 'true'
>     > > );
>     > >
>     > > Of course this is slightly different from regular Flink core
>     > > configuration but a connector still needs to be configured
>     based on
>     > > these options.
>     > >
>     > > However, I think this FLIP violates our code style guidelines
>     because
>     > >
>     > > 'format' = 'json',
>     > > 'format.fail-on-missing-field' = 'false'
>     > >
>     > > is an invalid hierarchy. `format` cannot be a string and a
>     top-level
>     > > object at the same time.
>     > >
>     > > We have similar problems in our runtime configuration:
>     > >
>     > > state.backend=
>     > > state.backend.incremental=
>     > > restart-strategy=
>     > > restart-strategy.fixed-delay.delay=
>     > > high-availability=
>     > > high-availability.cluster-id=
>     > >
>     > > The code style guide states "Think of the configuration as nested
>     > > objects (JSON style)". So such hierarchies cannot be
>     represented in a
>     > > nested JSON style.
>     > >
>     > > Therefore, should we advocate instead:
>     > >
>     > > 'format.kind' = 'json',
>     > > 'format.fail-on-missing-field' = 'false'
>     > >
>     > > What do you think?
>     > >
>     > > Thanks,
>     > > Timo
>     > >
>     > > [1]
>     > >
>     >
>     https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
>     > >
>     >
>     >
>

Mime
View raw message