druid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keefe Roedersheimer <keefe.roedershei...@verizonmedia.com.INVALID>
Subject filtering historicals with missing lookups and other issues
Date Wed, 21 Oct 2020 19:18:04 GMT
Hi, 

I have a couple PRs and an associated issue here and I was wondering if someone had a chance
to take a look?

In Druid clusters with a large number of historicals, there are usually multiple replicas
for each segment and if one server misbehaves, an entire query can fail even though it could
have been completed. One common way is for a globally cached lookup to fail on some nodes,
for example if some nodes cannot authenticate to a database due to a misconfigured node. The
goal is to transparently avoid distributing the query to nodes that will cause it to fail
when alternatives are available. 

There is already an issue here listen in [4] that talks about filtering generally misbehaving
nodes, so this is a special case. There is one aspect of this issue that is different, which
is that lookups are specific to a query, so it requires server selection to be aware of queries.
Perhaps, there are other reasons that selection can be benefited such as having affinity for
certain historicals to take advantage of caching. This change to the server selection is relatively
minimal, which is in [2].

The PR in [3] introduces a solution that can be installed as an extension with only the small
change to add the Query parameter to the pick interface. This takes the approach of adding
the filtering at the ServerSelectorStrategy level and then allowing a delegating to an existing
ServerSelectorStrategy once the servers have been filtered. This approach (when selected)
only increased server selection time by a few milliseconds (~8ms to ~11ms) when querying a
13,000 segment datasource with queries that finished on the order of a few seconds, so it
does not introduce much overhead. 

Alternatively, a filter could be introduced directly as a first class citizen in Druid because
this seems to be a problem that exists on multiple levels, perhaps by adding a Filter in TierSelectorStrategy
before the ServerSelectorStrategy is invoked. I've also been working on the general case and
it might simplify the wiring of layering multiple filters. 

In either case, I think that it is a minimal cost to allow queries to influence server selection,
either directly before calling the ServerSelectorStrategy or as an optional server selector
strategy. 

Does anybody have an opinion about either adding the parameter to the pick method as in [2]
or adding a filter that can consider the query? The remaining functionality could then be
added as an extension, so the changes to core druid would be minimal. 

thanks for reading!

Keefe


[1] Failed Query due to missing lookup on some servers 
    https://github.com/apache/druid/issues/10294 

[2] allow server selection to be aware of query
    https://github.com/apache/druid/pull/10428

[3] ServerSelectorStrategy to filter servers with missing required lookups
    https://github.com/apache/druid/pull/10427

[4] Broker resiliency to misbehaving historical node
    https://github.com/apache/druid/issues/5709

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@druid.apache.org
For additional commands, e-mail: dev-help@druid.apache.org


Mime
View raw message