kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin McCabe <co...@cmccabe.xyz>
Subject Re: [DISCUSS] URIs on Producer and Consumer
Date Fri, 06 Oct 2017 17:08:40 GMT
On Thu, Oct 5, 2017, at 13:33, Michael Pearce wrote:
> To me, this is a lot more in line with many other systems connections, to
> have the ability to have a single connection string / uri, is this really
> that left field suggesting or wanting this?
> 
> If anything this bring kafka more standardised approach imo, to have a
> unified resource identifier, protocol name and a set schema for that.
> 
> e.g.
> Database connection strings like
> 
> oracle:
> jdbc:oracle:thin:@(description=(address_list=
>    (address=(protocol=tcp)(port=1521)(host=prodHost)))
> (connect_data=(INSTANCE_NAME=ORCL)))

Hmm.  That isn't a URI, though, right?  So adopting URIs doesn't help us
integrate with JDBC.  In any case, since Kafka is not a database, it is
a little unclear what better integration with JDBC would look like. 
Perhaps that is worth thinking about at some point, but it seems
unrelated to this URI discussion.

> On 05/10/2017, 20:10, "Clebert Suconic" <clebert.suconic@gmail.com>
> wrote:
> 
>     On Thu, Oct 5, 2017 at 2:20 PM, Colin McCabe <cmccabe@apache.org>
>     wrote:
>     > We used URIs as file paths in Hadoop.  I think it was a mistake, for a
>     > few different reasons.
>     >
>     > URIs are actually very complex.  You probably know about scheme, host,
>     > and port, but did you know about authority, user-info, query, fragment,
>     > scheme-specific-part?  Do you know what they do in Hadoop?  The mapping
>     > isn't obvious (and it wouldn't be obvious in Kafka either).
> 
>     URIs are just a hashmap of key=string.. just like Properties...

You really can't treat a URI as a hashmap.  For one thing, the scheme
and hostname parts are not optional.

You are probably thinking of the "query" part (the part after the
question mark).  This isn't  map either-- it's a sequence of
comma-separated key=value pairs.  The same key can appear multiple
times.  And you have to encode everything with RFC3986 "percent
encoding."

>     The Consumer and Producer is just having such hashMap.. and these
>     values are easy to translate to boolean, integer.. etc. We would just
>     need to add such mapping as part of this task when done. I don't see
>     anything difficult there.

I don't object to having some kind of connection string that rolls up
all the configuration properties.  I just don't think it should be a
URI.

>     >
>     > When you flip back and forth between URIs and strings (and you
>     > inevitably will do this, when serializing or sending things over the
>     > wire), you run into tons of really hard problems.  Should you preserve
>     > the "fragment" (the thing after the hash mark) for your URI, or not?  It
>     > may not do anything now, but maybe it will do something later.  URIs
>     > also have complex string escaping rules.  Parsing URIs is very messy,
>     > especially when you start talking about non-Java programming languages.
> 
> 
>     Why flip back and forth? URIs would generate the same HashMap that's
>     being generated today.. I don't see any mess here.
>     Besides... This would be an addition, not replacement...
> 
>     And I'm talking only about the Java API now.

We have a lot of non-Java clients-- those should be part of the
discussion.

> 
>     Again, All the properties on ProducerConfig and ConsumerConfig seems
>     easy to be mapped as primitive types (String, numbers.. booleans).
> 
>     Serialization shouldn't be a problem there. it would generate the
>     same
>     properties it's generated now.
> 
>     >
>     > URIs are designed for a world where you talk to a single host over a
>     > single port.  That isn't the world distributed systems live in.  You
>     > don't want your clients to fail to bootstrap because the single server
>     > you specified is having a bad day, even when the other 8 servers are up.
> 
>     I have seen a few projects using this style of URI: I would make it
>     doing the same here:
> 
>     If you have multiple hosts:
> 
>     KafkaConsumer consumer = new
>     KafkaConsumer("kafka:(kafka://host1:port,kafka://host2:port)?property1=value");

That's not a valid URI?

> 
>     if you have a single host:
>     KafkaConsumer consumer = new
>     KafkaConsumer("kafka://host2:port?property1=value&property2=value2");
> 
> 
>     One example of an apache project using a similar approach is
>     qpid-jms:
>     http://qpid.apache.org/releases/qpid-jms-0.25.0/docs/index.html#failover-configuration-options
> 
> 
>     > The bottom line is that URIs are the wrong abstraction for the job.
>     > They just don't express what we really want, and they introduce a lot of
>     > complexity and ambiguity.
> 
>     I have seen the opposite to be honest. this has been simpler for me
>     and users I know than using a HashMap.. .  users in my experience
>     tend
>     to write this faster.

Users tend to find make mistakes when writing URIs.  For example, how do
you translate a filename with spaces and commas into a URI?  I had to
debug these issues.  It is why I dislike URIs.

As I said before, a connection string might be a good idea.  A URI, no.

best,
Colin

> 
>     users can certainly put up with the HashMap.. but this is easier to
>     remember. I'm just proposing what I think it's a simpler API.
> 
> 
> 
> 
>     Perhaps we should move into the KIP discussion itself here.. I first
>     intended to start this thread to see if it would make sense or not...
>     But I don't have authorization to create the KIP page.. so again..
>     based on the contributing page.. can someone add me authorizations to
>     the WIKI space?
> 
> 
> The information contained in this email is strictly confidential and for
> the use of the addressee only, unless otherwise indicated. If you are not
> the intended recipient, please do not read, copy, use or disclose to
> others this message or any attachment. Please also notify the sender by
> replying to this email or by telephone (+44(020 7896 0011) and then
> delete the email and any copies of it. Opinions, conclusion (etc) that do
> not relate to the official business of this company shall be understood
> as neither given nor endorsed by it. IG is a trading name of IG Markets
> Limited (a company registered in England and Wales, company number
> 04008957) and IG Index Limited (a company registered in England and
> Wales, company number 01190902). Registered address at Cannon Bridge
> House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited
> (register number 195355) and IG Index Limited (register number 114059)
> are authorised and regulated by the Financial Conduct Authority.

Mime
View raw message