couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reddy B. <redd...@live.fr>
Subject Re: Gridfs for CouchDb
Date Mon, 29 Jul 2019 15:10:23 GMT
Many thanks for your reply Adam, this is very interesting. So if I understand you correctly,
unfortunately even this approach wouldn't be the silver bullet to address performance issues.
I though they were mainly due to replication, I will need to dig deeper into the archives
to understand the situation better.

The big appeal with having everything in the database is consistency (especially considering
the replication capabilities of CouchDb), it is too easy to have orphaned links/path. But
even by saying this, the truth is that what we really care about is the "blackbox" around
the blob ensuring that the application fully owns the lifecycle of the blob (and that no sysadmin
and/or cloud vendor can mess it up, by moving or deleting files, changing urls structures,
or even because of billing issues).

So if talking to an external storage still enables couchdb to fetch the blob upon replication
(so that the user does not need to mess around manually transferring blobs), then this would
work even for us who care so much about this aspect. Having to install a storage engine on
every instance/server in addition to couchdb is not a big deal. Then one can create credentials
for the storage engine that only couchdb knows.

I would make the distinction between attachments and largeAttachments to convey to the user
that a normal attachment is stored in full with the document, while a largeAttachment is chunked
gridfs-style.

So maybe attachments can be made to respect whatever limit FoundationDb has, while largeAttachments
require a storage engine to be configured.

Reddy

________________________________
De : Adam Kocoloski <kocolosk@apache.org>
Envoyé : lundi 29 juillet 2019 16:33
À : dev@couchdb.apache.org <dev@couchdb.apache.org>
Objet : Re: Gridfs for CouchDb

Hi Reddy,

Yes, something like this is possible to build on FoundationDB. The main challenge is that
every FoundationDB transaction needs to be under 10MB, so the CouchDB layer would need to
stitch together multiple transactions in order to support larger attachments and record some
metadata at the end to make the result visible to the user.

Personally, I’d like to see a design for attachments that allows CouchDB the option to offload
the actual binary storage for attachments to an object store purpose-built for that sort of
thing, while still maintaining the CouchDB API including replication capabilities. All the
major cloud providers have object storage services, and if you’re not running on cloud infrastructure
there are open source projects like Minio and Ceph that are far more efficient at storing
large binaries than CouchDB or FoundationDB will ever be.

Of course, I recognize that this integration is extra complexity that many administrators
do not need or want, and so we’ll require some native option for attachment storage. The
main question I have is whether we write all the extra code to support internal storage of
attachments that exceed 10 MB, knowing that we’d still deliver worse performance at higher
cost than the “object store offload” approach.

I’m curious why you proposed “attachment” vs. “largeAttachment” as a user-visible
distinction? That hadn’t occurred to me personally. Cheers,

Adam

> On Jul 29, 2019, at 1:43 AM, Reddy B. <reddy.b@live.fr> wrote:
>
> Hello,
>
> MongoDb has a driver called Gridfs intended to handle large files. Since they have a
hard limit of 16mb per document, this driver transparently splits a file in 256kb chunks and
then transparently reassembles it upon read. Metadata are stored so they support things such
as range queries (very useful in video/audio streaming scenario - Couchdb supports range queries
too), more information is available on this page:
>
> https://docs.mongodb.com/manual/core/gridfs/
>
> I was wondering is something similar could be built on top of FoundationDb and if such
an approach would solve the current issues with large attachments. In particular, it could
make replication easier, since only small files would need to be replicated and it would be
easier to resume replication at a particular chunk.
>
> MongoDb stores this data in a dedicated "collection" which is not the CouchDb way. My
thinking was that this could be opt-in: in addition to a document being able to have an attachment,
we could introduce a new entity called largeAttachment using such a driver behind the scene,
and the user would choose how to best store his data based on the performance caracteristics
of each storage method and his needs (field, attachment, largeAttachments).
>
> I am just wondering if the idea is broadly feasible in the next FDB based version or
if there is an obvious showstopper / challenge that would need to be addressed first.
>
> Thank you!
>
> Reddy


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message