couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benoit Chesneau <bchesn...@gmail.com>
Subject Re: NPM, CouchDB and big attachments
Date Wed, 27 Nov 2013 13:56:20 GMT
On Wed, Nov 27, 2013 at 12:59 PM, Robert Newson <rnewson@apache.org> wrote:

> I think NPM mostly struggle with disk issues (all attachments in the
> same file, it's 100G) and replication (a document with lots of
> attachments has to be transferred fully in the same connection without
> interruption or else it starts over).
>
> Both of these are fixable without taking the extreme measure of moving
> the attachments out of couchdb entirely. That would pretty much
> eliminate the point of using CouchDB for this registry.


The backed to store attachments could be pluggable though. What is
important here is to have a coherent API to replicate them along the doc.
Where they are stored should/could be an operational decision.


> That's a
> perfectly reasonable thing for the registry owners to do but changing
> CouchDB is going too far. I've previously advocated for "external"
> attachments, whether that's a file-per-attachment or a separate .att
> file of all attachments. I've since recanted, it's not compelling
> enough to compensate for the extra failure conditions (the .couch file
> exists but the .att file is gone, say).
>

I guess it depends how the backend work, or which kind of backend may be
used. People alreayd have such problem when they are indexing elsewhere
(like on ES), they have to take care about both the indexes and the docs.
adding another path to backup is a known science.


>
> For the actual problems, the bigcouch merge will bring sharding (a
> q=10 database would consist of ten 10G files, each individually
> compactable, can be hosted on different machines, etc). CouchDB 1.5.0
> improved replication behaviour around attachments but there's
> definitely more work to be done. Particularly, we could make
> attachment replication resumable. Currently, if we replicate 99.9% of
> a large attachment, lose our connection, and resume, we'll start over
> from byte 0. This is why, elsewhere, there's a suggestion of 'one
> attachment per document'. That is a horrible and artificial constraint
> just to work around replicator deficiencies. We should encourage sane
> design (related attachments together in the same document) and fix the
> bugs that prevent heavy users from following it.
>
>
All of that is is fine, but imo we should also take in consideration people
that have really large blobs (like videos files). Where docs are only here
to handled structured data around these blobs. They could of course just
provides links to these blobs or we could provide them a solution to hoste
these large blobs. Simply sharding is not enough in that case. And they
generally have a proper solution to store such blobs, so if we could make
the attachements as a proxy to these video they will benefit for free of
the couchdb replication while storing their blobs on their tools.

- benoit

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message