couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Miller <alexmil...@apple.com.INVALID>
Subject Re: [DISCUSS] Mango indexes on FDB
Date Tue, 24 Mar 2020 20:52:16 GMT

> On Mar 24, 2020, at 05:51, Garren Smith <garren@apache.org> wrote:
> On Tue, Mar 24, 2020 at 1:30 AM Joan Touzet <wohali@apache.org <mailto:wohali@apache.org>>
wrote:
> 
>> Question: Imagine a node that's been offline for a bit and is just
>> coming back on. (I'm not 100% sure how this works in FDB land.) If
>> there's a (stale) index on disk, and the index is being updated, and the
>> index on disk is kind of stale...what happens?
>> 
> 
> With couchdb_layer this can't happen as each CouchDB node is stateless and
> doesn't actually keep any indexes. Everything would be in FoundationDB. So
> if the index is built then it is built and ready for all couch_layer nodes.
> 
> FoundationDB storage servers could fall behind the Tlogs. I'm not 100% sure
> what would happen in this case. But it would be consistent for all
> couch_layer nodes.

When a client gets a read version to begin a transaction in FDB, it is promised that this
was the most recent version at some point in time between issuing the request and receiving
the reply.  When it issues reads, those reads must include the version, and must get back
the most recently written value for that key as of the included version.  FDB is not allowed
to break this contract during faults.

The cluster will continue advancing in versions, as it does not throttle if only one server
in a shard falls behind (or is offline).  When the server comes back online, it will pull
the stream of mutations from the transaction logs to catch up.  In the meantime, it will continue
to be unavailable for reads until it catches up, as clients send read requests for a specific
(recent) version that the lagging storage server knows that it does not have.  After 1s, it
will reply with a `future_version` error to tell the client it won’t be getting an answer
soon.  The client will then make a decision based upon either the error or observed latency
to re-issue the read to a different replica of that shard so that it may get an answer, and
will continue doing so until it notices that the lagged storage server has caught up and is
responding successfully.

If you’re interested in more details around the operational side of a storage server failure,
I’d suggest reading the threads that Kyle Snavely started on the FDB Forums:
https://forums.foundationdb.org/t/quick-question-on-tlog-disk-space-for-large-clusters/1962
<https://forums.foundationdb.org/t/quick-question-on-tlog-disk-space-for-large-clusters/1962>
https://forums.foundationdb.org/t/questions-regarding-maintenance-for-multiple-storage-oriented-machines-in-a-data-hall/2010
<https://forums.foundationdb.org/t/questions-regarding-maintenance-for-multiple-storage-oriented-machines-in-a-data-hall/2010>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message