couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Rhodes <mrho...@linux.vnet.ibm.com>
Subject Re: Shard level querying in CouchDB Proposal
Date Thu, 23 Nov 2017 12:54:32 GMT


> On 22 Nov 2017, at 18:39, Geoffrey Cox <redgeoff@gmail.com> wrote:
> 
> Hi Mike, this sounds like a pretty cool enhancement. Just to clarify,
> you're also proposing modifying the PUT/POST doc, etc... so that you can
> specify a shard key per doc so that the doc can be stored on a specific
> shard?

Yes, sort of. A document create request specifies a shard key as part of the document ID.
The guarantee with respect to document placement then is:

"All documents with the same shard key are stored in the same shard".

By means of contrast, this *isn't* a way of saying "Put document on specific shard X". I don't
find that ability very compelling for a user (why would they care that their doc was in range
000000000-abababab or whatever?), but introducing this grouping mechanism as a higher level
abstraction on things meaningful within a data model I think does offer substantial benefit.

To elaborate on why this is useful a couple use-cases might help.

The first example is along the lines of using a user ID as a shard key. All documents for
that user then end up on the same shard. A query can then be scoped by user ID (as its the
shard key), which means that queries for a single user's data can be efficiently served from
a single shard rather than asking all shards. This would significantly improve performance
of an application from the point of view of that user.

Or, in an IoT use case, you might use the device ID as the shard key enabling fast retrieval
of measurements from a single device.

It's important to note too that a shard may store documents from many different shard keys,
so long as the above guarantee holds. In addition, the shard key needs to have high cardinality
and to effectively spread requests over the shards.

An example that doesn't work is using the date as the shard key for the IoT case: while this
has a high cardinality, at any given time, only a single shard will be in the write path.

Mike.



Mime
View raw message