incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: Shard takeover behavior
Date Mon, 17 Mar 2014 13:47:53 GMT
On Sat, Mar 15, 2014 at 12:57 PM, Ravikumar Govindarajan <
ravikumar.govindarajan@gmail.com> wrote:

> Aaron,
>
> I was thinking about another way of utilizing read-only shards
>
> Instead of logic/intelligence of finding a primary replica struggling/down,
> can we opt for pushing a logic on client-side?
>
> We can take a few approaches as below
>
> 1. Query both primary/secondary shards in parallel and return which ever
> comes first


> 2. Query both primary/secondary shards in parallel. Wait for primary
> response as per configured delay. If not forthcoming, return secondary's
> response
>

If I understand this one.  Favor the primary response until a certain
amount of time has passed then fall back to the secondary response assuming
it's available to return.


>
> These are useful only when client agrees for a "stale-read" scenario.
> "stale-read" in this case will be the last-commit of the index.
>
> What I am aiming at, is in the case of layout-conscious apps [layout does
> not change when VM update/crash/hang is restarted], we can always fall-back
> on replica reads, resulting in greater availability but lesser consistency
>
> A secondary-replica layout need to be present in ZK. Replica-shards should
> be always served from a server other than primary. May be we can switch-off
> buffer-cache for replica reads, as it is used only temporarily
>

Buffer cache?  Are you referring to block cache?  Or a query cache?  Just
as a FYI, Blur's query cache is currently disabled.  As for the block
cache, maybe.  The block cache seems to help performance quite a bit and
usually is does so at little cost.  Also, we could flush the secondary
shard from the cache from time to time.  Or we could just let it fall out
of the LRU.


>
> 95% apps queue their indexing operations and can always retry after primary
> comes back online.
>

The interesting thing here is that Blur is fully committed to disk (HDFS)
upon each mutate.  So assuming that the secondary shard has refreshed, the
primary shard being down just means that you can't write to that shard.
 Reads should be in the same state.


>
> Please let me know your views on this
>

I like all these ideas, the only thing I would add is that we we would need
to build these sort of options into Blur on a configured per-table basis.
 The querying both primary and secondary shards at the same time could
produce the most consistent respond times but at the cost of CPU resources
(obviously).

Thanks for the thoughts and ideas!  I like it!

Aaron


>
> --
> Ravi
>
>
> On Sat, Mar 8, 2014 at 8:56 PM, Aaron McCurry <amccurry@gmail.com> wrote:
>
> > On Fri, Mar 7, 2014 at 5:42 AM, Ravikumar Govindarajan <
> > ravikumar.govindarajan@gmail.com> wrote:
> >
> > > >
> > > > Well it works that way for OOMs and for when the process drop hard
> > (Think
> > > > kill -9).  However when a shard server is shutdown it currently ends
> > it's
> > > > session in ZooKeeper, thus triggering a layout change.
> > >
> > >
> > > Yes, may be we can have a config to determine whether it shud
> > end/maintain
> > > the session in ZK when doing a normal shutdown and then subsequent
> > restart.
> > > By this way, both MTTR-conscious and layout-conscious settings can be
> > > supported.
> > >
> >
> > That's a neat idea.  Once we have shards being served on multiple servers
> > we should definitely take a look at this.  When we implement the
> > multi-shard serving I would guess that there will be 2 layout strategies
> > (they might be implemented together).
> >
> > 1. Would be to get the N replicas online on different servers.
> > 2. Would the writing leader for the shard, assuming that it's needed.
> >
> >
> > >
> > > How do you think we can detect that a particular shard-server is
> > > struggling/shut-down and hence incoming search-requests need to go to
> > some
> > > other server?
> > >
> > > I am listing few paths off the top of my head
> > >
> > > 1. Process baby-sitters like supervisord, alerting controllers
> > > 2. Tracking first network-exception in controller and diverting to
> > > read-only
> > >     instance. Periodically may be re-try
> > > 3. Take a statistics based decision, based on previous response times
> > etc..
> > >
> >
> > Anding to this one and this may be obvious but measuring the response
> time
> > in comparison with other shards.  Meaning if the entire cluster is
> > experiencing an increase in load and all responses times are increasing
> we
> > wouldn't want to start killing off shard servers inadvertently.  Looking
> > for outliers.
> >
> >
> > > 4. Build some kind of leasing mechanism in ZK etc...
> > >
> >
> > I think that all of these are good approaches.  Likely to determine that
> a
> > node is misbehaving and should be killed/not used anymore we would want
> > multiple ways to measure that condition and then vote on the need kick
> out.
> >
> >
> > Aaron
> >
> > >
> > > --
> > > Ravi
> > >
> > >
> > > On Fri, Mar 7, 2014 at 8:01 AM, Aaron McCurry <amccurry@gmail.com>
> > wrote:
> > >
> > > > On Thu, Mar 6, 2014 at 6:30 AM, Ravikumar Govindarajan <
> > > > ravikumar.govindarajan@gmail.com> wrote:
> > > >
> > > > > I came to know about zk.session.timeout variable just now, while
> > > reading
> > > > > more about this problem.
> > > > >
> > > > > This will only trigger dead-node notification after the configured
> > > > timeout
> > > > > exceeds. Setting it to 3-4 mins must be fine for OOMs and
> > > > rolling-restarts.
> > > > >
> > > >
> > > > Well it works that way for OOMs and for when the process drop hard
> > (Think
> > > > kill -9).  However when a shard server is shutdown it currently ends
> > it's
> > > > session in ZooKeeper, thus triggering a layout change.
> > > >
> > > >
> > > > >
> > > > > Only extra stuff I am looking for, is to divert search calls to a
> > > > read-only
> > > > > shard instance during this 3-4 mins time to avoid mini-outages
> > > > >
> > > >
> > > > Yes, and I think that the controllers will automatically spread the
> > > queries
> > > > across those servers that are online.  The BlurClient class already
> > > takes a
> > > > list of connection strings and treats all connections as equals.  For
> > > > example, it's current use is to provide the client with all the
> > > controllers
> > > > connection strings.  Internally if any one of the controllers goes
> down
> > > or
> > > > has a network issue another controller is automatically retried
> without
> > > the
> > > > user having to do anything.  There is back off, ping, and pooling
> logic
> > > in
> > > > the BlurClientManager that the BlurClient utilizes.
> > > >
> > > > Aaron
> > > >
> > > >
> > > > >
> > > > > --
> > > > > Ravi
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Mar 6, 2014 at 3:34 PM, Ravikumar Govindarajan <
> > > > > ravikumar.govindarajan@gmail.com> wrote:
> > > > >
> > > > > > What do you think of giving an extra leeway for shard-server
> > >  failover
> > > > > > cases?
> > > > > >
> > > > > > Ex: Whenever a shard-server process gets killed, the
> > controller-node
> > > > does
> > > > > > not immediately update-layout, but rather mark it as a suspect.
> > > > > >
> > > > > > When we have a read-only back-up of shard, searches can continue
> > > > > > unhindered. Indexing during this time can be diverted to a queue,
> > > which
> > > > > > will store and retry-ops, when shard-server comes online again.
> > > > > >
> > > > > > Over configured number of attempts/time, if the shard-server
does
> > not
> > > > > come
> > > > > > up, then one controller-server can authoritatively mark it as
> down
> > > and
> > > > > > update the layout.
> > > > > >
> > > > > > --
> > > > > > Ravi
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message