couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johs Ensby <j...@b2w.com>
Subject Re: Shard level querying in CouchDB Proposal
Date Fri, 24 Nov 2017 05:50:38 GMT
Hi Mike and Geoff,
forgive me if I am asking a really stupid question, but
wouldn't restricting certain data to specific shards defy the very concept and core benefits
of a clustered database?
br
Johs

> On 23 Nov 2017, at 22:49, Geoffrey Cox <redgeoff@gmail.com> wrote:
> 
> Ah, yeah, this makes sense to me. I think this has great potential!
> 
> On Thu, Nov 23, 2017 at 4:56 AM Mike Rhodes <mrhodes@linux.vnet.ibm.com>
> wrote:
> 
>> 
>> 
>>> On 22 Nov 2017, at 18:39, Geoffrey Cox <redgeoff@gmail.com> wrote:
>>> 
>>> Hi Mike, this sounds like a pretty cool enhancement. Just to clarify,
>>> you're also proposing modifying the PUT/POST doc, etc... so that you can
>>> specify a shard key per doc so that the doc can be stored on a specific
>>> shard?
>> 
>> Yes, sort of. A document create request specifies a shard key as part of
>> the document ID. The guarantee with respect to document placement then is:
>> 
>> "All documents with the same shard key are stored in the same shard".
>> 
>> By means of contrast, this *isn't* a way of saying "Put document on
>> specific shard X". I don't find that ability very compelling for a user
>> (why would they care that their doc was in range 000000000-abababab or
>> whatever?), but introducing this grouping mechanism as a higher level
>> abstraction on things meaningful within a data model I think does offer
>> substantial benefit.
>> 
>> To elaborate on why this is useful a couple use-cases might help.
>> 
>> The first example is along the lines of using a user ID as a shard key.
>> All documents for that user then end up on the same shard. A query can then
>> be scoped by user ID (as its the shard key), which means that queries for a
>> single user's data can be efficiently served from a single shard rather
>> than asking all shards. This would significantly improve performance of an
>> application from the point of view of that user.
>> 
>> Or, in an IoT use case, you might use the device ID as the shard key
>> enabling fast retrieval of measurements from a single device.
>> 
>> It's important to note too that a shard may store documents from many
>> different shard keys, so long as the above guarantee holds. In addition,
>> the shard key needs to have high cardinality and to effectively spread
>> requests over the shards.
>> 
>> An example that doesn't work is using the date as the shard key for the
>> IoT case: while this has a high cardinality, at any given time, only a
>> single shard will be in the write path.
>> 
>> Mike.
>> 
>> 
>> 


Mime
View raw message