zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andor Molnar <an...@cloudera.com>
Subject Re: Name resolution in StaticHostProvider
Date Wed, 09 May 2018 00:54:13 GMT
Let's put it this way:

1- The interpretation of next() is "give me the next server's IP address
that I should connect to and do whatever preparation (resolution) is
necessary to do so".
The answer could be:
a) I'm done, here it is,
b) there's a problem with the next item in the list, for instance it's
unreasolvable (or else), please deal with it

When it returns, the caller can decide if it wants to advance to the next
item (call next() again) or exit with a fatal error (list items must always
be resolvable) or exit because it doesn't need to connect anymore (client
closed the socket). (...and there could be a whole bunch of other possible
cases...)

So, to wrap up, if you take a look at the retry logic already built in
ClientCnxn, the while-try-catch block is 100+ lines long. First, it already
has the retry logic, second we don't want to go down the road of copying
part of the logic to the HostProvider's level. Reliability, testability,
corner-cases, etc, etc.

What do you think?






On Tue, May 8, 2018 at 1:55 PM, Flavio Junqueira <fpj@apache.org> wrote:

> The refactoring did not seem justifiable at first, so my reaction to it.
> You have clarified the reason for including the changes, and I actually
> like it.
>
> About the exception, there are two points for me:
>
> 1- You don't really need to throw to execute the plan you described.
> 2- In the case we do throw, which is not entirely unreasonable, I'd think
> about the expected behaviour of a method called "next()". In general, I'd
> expect it to either return me the next item or error saying that it cannot
> return an item. The "UnknownHostException" is not doing the latter, though.
> It is indicating that one of the elements of the host provider set is
> "broken" (not resolvable). That's one interpretation. Another
> interpretation is that the logic in "next" needs to indicate to the caller
> that there is a problem with an item in the set of elements of the host
> provider.
>
> For (2) let's converge on what semantics we are trying to provide first,
> please.
>
> -Flavio
>
> > On 8 May 2018, at 21:20, Andor Molnar <andor@cloudera.com> wrote:
> >
> > Sorry, I thought you were against the whole refactoring.
> >
> > "2- That we discuss separately whether we want to change the behaviour of
> > the next()in the HostProvider interface."
> >
> > From this it seemed to me, it's not just a polishing issue, but maybe
> I've
> > gotten you wrong.
> > Anyway, there're 2 contention points we've encountered so far:
> >
> > 1. Do we need to refactor at all?
> > 2. If we do so, shall next() throw UnknownHostException or deal with
> error
> > inside.
> >
> > And I'd still go with:
> > 1. Yes
> > 2. Yes, throw
> >
> > So, I would leave the PR as it is now.
> >
> > Andor
> >
> >
> >
> >
> >
> > On Tue, May 8, 2018 at 12:11 PM, Flavio Junqueira <fpj@apache.org>
> wrote:
> >
> >> Can you list what the contention points are according to you? Feel free
> to
> >> do that in the PR as well, I want to make sure we have the same
> >> understanding of the points that still need to be resolved. From where I
> >> stand, there is no major issue pending other than one polishing issue
> that
> >> I brought upon in the PR.
> >>
> >> -Flavio
> >>
> >>> On 8 May 2018, at 21:08, Andor Molnar <andor@cloudera.com> wrote:
> >>>
> >>> I'm happy to do that once we have an agreement.
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, May 8, 2018 at 8:34 AM, Flavio Junqueira <fpj@apache.org>
> wrote:
> >>>
> >>>> It might be a good idea to document whatever we end up doing.
> >>>>
> >>>> -Flavio
> >>>>
> >>>>> On 8 May 2018, at 17:22, Andor Molnar <andor@cloudera.com>
wrote:
> >>>>>
> >>>>> "If refactoring is necessary to address the issue, then it should
be
> >> part
> >>>>> of the PR."
> >>>>>
> >>>>> Agreed. I think it is and it also makes things a lot more simpler,
so
> >>>> it's
> >>>>> easier to review.
> >>>>>
> >>>>> "I'm not sure what kind of confirmation you are after here. Could
you
> >>>>> elaborate, please?"
> >>>>>
> >>>>> I'm just wondering what could have been the reason for caching host
> >>>>> addresses in StaticHostProvider in the first place.
> >>>>>
> >>>>> "The other solution, if I remember enough of it, tried to avoid
> >> resolving
> >>>>> unnecessarily, but perhaps the DNS client cache is enough..."
> >>>>>
> >>>>> That's exactly my point: what JDK is doing internally is perfectly
> fine
> >>>> for
> >>>>> us and we don't need extra logic here.
> >>>>>
> >>>>> "Could you elaborate on this point, please? What exactly do you
mean
> >> with
> >>>>> this statement?"
> >>>>>
> >>>>> See my previous point. Caching is already done in all DNS servers
in
> >> the
> >>>>> chain and also there's also caching in JDK already, which means
by
> >>>> calling
> >>>>> getByName() you don't necessarily fire a DNS request, only when
the
> TTL
> >>>> is
> >>>>> expired.
> >>>>>
> >>>>> Andor
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tue, May 8, 2018 at 8:12 AM, Flavio Junqueira <fpj@apache.org>
> >> wrote:
> >>>>>
> >>>>>> Hi Andor,
> >>>>>>
> >>>>>> Thanks for your work on addressing the issue.
> >>>>>>
> >>>>>>> On 8 May 2018, at 16:06, Andor Molnar <andor@cloudera.com>
wrote:
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> Updating this thread, because the PR is still being review
on
> GitHub.
> >>>>>>>
> >>>>>>> So, the reason why I refactored the original behaviour of
> >>>>>>> StaticHostProvider is that I believe that it's trying to
do
> something
> >>>>>> which
> >>>>>>> is not its responsibility.
> >>>>>>
> >>>>>> If refactoring is necessary to address the issue, then it should
be
> >> part
> >>>>>> of the PR. Otherwise, it is better to refactor in a separate
PR.
> >>>>>>
> >>>>>>
> >>>>>>> Please tell me if there's a good historical
> >>>>>>> reason for that.
> >>>>>>
> >>>>>> I'm not sure what kind of confirmation you are after here. Could
you
> >>>>>> elaborate, please?
> >>>>>>
> >>>>>>>
> >>>>>>> My approach is giving the user the following to options:
> >>>>>>> 1- Use static IP addresses, if you don't want to deal with
DNS
> >>>> resolution
> >>>>>>> at all - we guarantee that no DNS logic will involved in
this case
> at
> >>>>>> all.
> >>>>>>
> >>>>>> Sounds fine.
> >>>>>>
> >>>>>>> 2- Use DNS hostnames if you have a reliable DNS service
for
> >> resolution
> >>>>>>> (with HA, secondary servers, backups, etc.) - we must use
DNS in
> the
> >>>>>> right
> >>>>>>> way in this case e.g. do NOT cache IP address for a longer
period
> >> that
> >>>>>> DNS
> >>>>>>> server allows to and re-resolve after TTL expries, because
it's
> >>>> mandatory
> >>>>>>> by protocol.
> >>>>>>
> >>>>>> Sounds fine.
> >>>>>>
> >>>>>>>
> >>>>>>> My 2 cents here:
> >>>>>>> - the fix which was originally posted for re-resolution
is a
> >> workaround
> >>>>>> and
> >>>>>>> doesn't satisfy the requirement for #2,
> >>>>>>
> >>>>>> The other solution, if I remember enough of it, tried to avoid
> >> resolving
> >>>>>> unnecessarily, but perhaps the DNS client cache is enough to
avoid
> the
> >>>>>> unnecessary round-trips. This observation actually brings me
to the
> >> next
> >>>>>> point:
> >>>>>>
> >>>>>>> - the solution is already built-in in JDK and DNS clients
in the
> >> right
> >>>>>> way
> >>>>>>
> >>>>>> Could you elaborate on this point, please? What exactly do you
mean
> >> with
> >>>>>> this statement?
> >>>>>>
> >>>>>>> - can't see a reason why we shouldn't use that
> >>>>>>>
> >>>>>>> I checked this in some other projects as well and found
very
> similar
> >>>>>>> approach in hadoop-common's SecurityUtil.java. It has 2
built-in
> >>>> plugins
> >>>>>>> for that:
> >>>>>>> - Standard resolver uses java's built-in getByName().
> >>>>>>> - Qualified resolver still uses getByName(), but adds some
logic to
> >>>> avoid
> >>>>>>> incorrect re-resolutions and reverse IP lookups.
> >>>>>>>
> >>>>>>> Please let me know your thoughts.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Andor
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Mar 6, 2018 at 8:12 AM, Andor Molnar <andor@cloudera.com>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Abe,
> >>>>>>>>
> >>>>>>>> Unfortunately we haven't got any feedback yet. What
do you think
> of
> >>>>>>>> implementing Option #3?
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Andor
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Thu, Feb 22, 2018 at 6:06 PM, Andor Molnar <andor@cloudera.com
> >
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Did anybody happen to take a quick look by any chance?
> >>>>>>>>>
> >>>>>>>>> I don't want to push this too hard, because I know
it's a time
> >>>>>> consuming
> >>>>>>>>> topic to think about, but this is a blocker in 3.5
which has been
> >>>>>> hanging
> >>>>>>>>> around for a while and any feedback would be extremely
helpful to
> >>>>>> close it
> >>>>>>>>> quickly.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Andor
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Mon, Feb 19, 2018 at 12:18 PM, Andor Molnar <
> andor@cloudera.com
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi all,
> >>>>>>>>>>
> >>>>>>>>>> We need more eyes and brains on the following
PR:
> >>>>>>>>>>
> >>>>>>>>>> https://github.com/apache/zookeeper/pull/451
> >>>>>>>>>>
> >>>>>>>>>> I added a comment few days ago about the way
we currently do DNS
> >>>> name
> >>>>>>>>>> resolution in this class and a suggestion on
how we could
> simplify
> >>>>>> things a
> >>>>>>>>>> little bit. We talked about it with Abe Fine,
but we're a little
> >> bit
> >>>>>> unsure
> >>>>>>>>>> and cannot get a conclusion. It would be extremely
handy to get
> >> more
> >>>>>>>>>> feedback from you.
> >>>>>>>>>>
> >>>>>>>>>> To add some colour to it, let me elaborate on
the situation
> here:
> >>>>>>>>>>
> >>>>>>>>>> In general, the task that StaticHostProvider
does is to get a
> list
> >>>> of
> >>>>>>>>>> potentially unresolved InetSocketAddress objects,
resolve them
> and
> >>>>>> iterate
> >>>>>>>>>> over the resolved objects by calling next()
method.
> >>>>>>>>>>
> >>>>>>>>>> *Option #1 (current logic)*
> >>>>>>>>>> - Resolve addresses with getAllByName() which
returns a list of
> IP
> >>>>>>>>>> addresses associated with the address.
> >>>>>>>>>> - Cache all these IP's, shuffle them and iterate
over.
> >>>>>>>>>> - If client is unable to connect to an IP, remove
all IPs from
> the
> >>>>>> list
> >>>>>>>>>> which the original servername was resolved to
and re-resolve it.
> >>>>>>>>>>
> >>>>>>>>>> *Option #2 (getByName())*
> >>>>>>>>>> - Resolve address with getByName() instead which
returns only
> the
> >>>>>> first
> >>>>>>>>>> IP address of the name,
> >>>>>>>>>> - Do not cache IPs,
> >>>>>>>>>> - Shuffle the *names* and resolve with getByName()
*every time*
> >> when
> >>>>>>>>>> next() is called,
> >>>>>>>>>> - JDK's built-in caching will prevent name servers
from being
> >>>> flooded
> >>>>>>>>>> and will do the re-resolution automatically
when cache expires,
> >>>>>>>>>> - Names with multiple IPs will be handled by
DNS servers which
> (if
> >>>>>>>>>> configured properly) return IPs in different
order - this is
> >> called
> >>>>>> DNS
> >>>>>>>>>> Round Robin -, so getByName() will return different
IP on each
> >> call.
> >>>>>>>>>>
> >>>>>>>>>> *Options #3*
> >>>>>>>>>> - There's a small problem with option#2: if
DNS server is not
> >>>>>> configured
> >>>>>>>>>> properly and handles the round-robin case in
a way that it
> always
> >>>>>> return
> >>>>>>>>>> the IP list in the same order, getByName() will
never return the
> >>>> next
> >>>>>> ip,
> >>>>>>>>>> - In order to overcome that, use getAllByName()
instead, shuffle
> >> the
> >>>>>>>>>> list and return the first IP.
> >>>>>>>>>>
> >>>>>>>>>> All feedback if much appreciated.
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Andor
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message