drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@dremio.com>
Subject Re: Zookeeper down before query starts/after query finishes
Date Sun, 08 Nov 2015 19:56:32 GMT
I think we need to talk through a couple of different scenarios and decide
on Drill behavior in each.

Client Based
1) Initial connection to ZK from client fails
2) Client loses ZK Connection
  a) Reconnects within session timeout
  b) Cannot reconnect within session timeout (loses session)
3) ZK Connection is gets reconnected with new session (2b)

Drillbit Based
4) Drillbit initial connection fails to complete
5) Drillbit loses connection
  a) reconnects within session timeout
  b) cannot reconnect within session timeout (loses session)
6) Drillbit reestablishes connection after timeout (5b)

It seems like your initial proposal is entirely focused on item (5b) in the
list above. However, the code change affects all items 1-6. I think it
would be worthwhile to come up with clear definition of desired behavior
for all items 1-6. I also think the behavior in 2b should probably be very
different than in 5b.

Note, I'm not suggesting that this initial fix needs to resolve all items
to the desired behavior. However, it is hard to review the patch without
measuring against what are target is across the items. My hope out of this
is a clear framework to review the patch as well as a number of jiras to
resolve issues across each of these issues where there are gaps.

thanks!
jacques



--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Sun, Nov 8, 2015 at 9:36 AM, Hsuan Yi Chu <hyichu@maprtech.com> wrote:

> I just submitted a pull request to address DRILL-3751, which focuses on the
> scenario where query already finishes and zookeeper dies. So Foreman cannot
> delete the profiles of running queries in zookeeper.
>
> I think in this case, after a few retries, Foreman can assume Zookeeper is
> down. And, this query is assumed to fail since client might not be able to
> receive the result (see the behavior in DRILL-3751
> <https://issues.apache.org/jira/browse/DRILL-3751>).
>
> Does this make sense?
>
>
> On Fri, Nov 6, 2015 at 10:43 AM, Hsuan Yi Chu <hyichu@maprtech.com> wrote:
>
> > My understanding is :
> > Before query starts/After query finishes, Foreman will put/delete running
> > query profiles in zookeeper.
> >
> > However, if zookeeper is down before the put/delete is successful, Drill
> > would be blocked at the put/delete operation.
> >
> > See https://issues.apache.org/jira/browse/DRILL-3751
> >
> > I think it is not quite right to let Drill just wait for Zookeeper to
> > respond. Does it make sense to use "time-out" here?
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message