couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Newson" <rnew...@apache.org>
Subject Re: FoundationDB & Multi tenancy model
Date Mon, 18 Mar 2019 17:53:22 GMT
Hi,

Firstly, CouchDB today does not have multi-tenancy as a feature. Cloudant does and achieves
this by inserting the tenant's name as a prefix on the database name (so "rnewson/db1" is
a different database to "sleroux/db1"), with appropriate stripping of the prefix in various
responses. I would like to see multi-tenancy carried into CouchDB as first-level feature,
though.

With that preamble done, each tenant will have a unique label pretty much by definition, and
this would be included in all the keys. Running that, or other properties, through a cryptographically
secure message digest algorithm achieves nothing but obfuscation and, as you note, the possibility
(however remote) of a collision. Crypto isn't magic, even if it looks like magic.

FDB provides the notion of a "Directory" which is a mechanism to help with very long keys,
given the key length constraint of 10k.

So, instead of representing a doc of {"foo":12} in "db1" of my "rnewson" account simply as;

/couchdb/rnewson/db1/doc1/foo => 12

we could create a Directory for the prefix "/couchdb/rnewson/db1" instead;

dirspace/couchdb/rnewson/db1 => 0x01
0x01/doc1/foo => 12

We're overdue for the Document Model RFC that would make this explicit.

Finally, I think we're passed the "proposition" stage as there is broad agreement (and no
disagreement) from the conversations already had. We are a little behind on writing and publishing
the RFC's that will describe the full work, though.

B.

-- 
  Robert Samuel Newson
  rnewson@apache.org

On Mon, 18 Mar 2019, at 17:32, Steven Le Roux wrote:
> Hi everyone.
> 
> I'm new here and just discovered the ongoing proposition for CouchDB to
> rely upon FDB.
> 
> With my team, we were considering providing an HTTP API over FDB in the
> form of the CouchDB API definition, so I'm very pleased to see there is
> already an ongoing effort for this (even if still a proposition). I've
> tried to catch up with all the good discussions on how you could make this
> work, mapping to the K/V model, but sorry if I could have missed a point.
> 
> I'm curious on how you're considering to manage multi tenancy while
> ensuring a good scalability and avoiding hotspotting.
> 
> I've read an idea from Mickael with CryptoHash to map the model this way :
> 
> {bucket_id}/{cryptohash}  : value
> 
> We currently use this CryptoHash mecanism to manage some data in a multi
> tenancy context applied to Time Series.
> 
> Here is a simple diagram that summarize it :
> 
> {raw_data} -> ingress component -> {hashed_metadata+data} -> HBase
>                                 -> {crypted_metadata}     -> HBase
>                                 -> {crypted_metadata}     -> Directory service
> 
> Query -> egress component -> HBase
> 
> raw_data is in the metric{tags} format, like in Prometheus/OpenTSDB/Warp10
> style.
> hashed metadata is a double 64 or 128 bits hashes of hash(metric) +
> hash(tags).
> Default is 64bits but it can lead to collision in the keyspace above 1B
> unique series where 128bits hashes are safer.
> egress will query the Directoy service to get the series list to be read in
> the store.
> 
> While authenticating, a custom "application" label is embedded into a label
> that ends in the data model, then hashed that avoid conflict between
> users.Hashed metadata are suffixed with a timestamp because it's convenient
> for Time Series data.
> What makes it very useful is :
>  - it can still use scans per series (metrics+tags)
>  - it avoids hotspotting the cluster and ensures a very good distributions
> among nodes
>  - it provides authentication through a directory service that act as an
> indirection
>  - keys are consistent while metrics or tags can be very long
> 
> I think this kind of model can perfectly apply to FDB for documents given
> that Namespace would be a user application/bucket/...  :
> 
> hash ( {NS} + {...} + {DOC_ID} ) / fields / ...
> 
> Drawbacks are that it may require a bit more storage for keys, but hashing
> could be adjusted given the use case. Moreover, managing rights at the
> document level would also require additional fields or few bytes to manage
> this, while using a directory index (could be as memory inside CouchDB,
> outside relying on something like Elastic, or available directly inside FDB)
> 
> I realize that just FDB as a backend is a considerable amount of work and
> pushing multi tenancy adds even more work maybe into CouchDB itself. For
> example, Tokens could embed rights and buckets ids, that would be used by
> CouchDB to authorize and build the underlying data model for storing with
> scalability and optimizations in mind. Also, did anyone considered reaching
> the FDB guys to try to align CouchDB document representation to the
> Document Layer (
> https://foundationdb.github.io/fdb-document-layer/data-modeling.html ).
> This would make CouchDB to be also MongoDB API compatible.
> 
> I don't where discussions are, but maybe we could help :)
>

Mime
View raw message