couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <>
Subject Re: NPM, CouchDB and big attachments
Date Wed, 27 Nov 2013 11:59:09 GMT
I think NPM mostly struggle with disk issues (all attachments in the
same file, it's 100G) and replication (a document with lots of
attachments has to be transferred fully in the same connection without
interruption or else it starts over).

Both of these are fixable without taking the extreme measure of moving
the attachments out of couchdb entirely. That would pretty much
eliminate the point of using CouchDB for this registry. That's a
perfectly reasonable thing for the registry owners to do but changing
CouchDB is going too far. I've previously advocated for "external"
attachments, whether that's a file-per-attachment or a separate .att
file of all attachments. I've since recanted, it's not compelling
enough to compensate for the extra failure conditions (the .couch file
exists but the .att file is gone, say).

For the actual problems, the bigcouch merge will bring sharding (a
q=10 database would consist of ten 10G files, each individually
compactable, can be hosted on different machines, etc). CouchDB 1.5.0
improved replication behaviour around attachments but there's
definitely more work to be done. Particularly, we could make
attachment replication resumable. Currently, if we replicate 99.9% of
a large attachment, lose our connection, and resume, we'll start over
from byte 0. This is why, elsewhere, there's a suggestion of 'one
attachment per document'. That is a horrible and artificial constraint
just to work around replicator deficiencies. We should encourage sane
design (related attachments together in the same document) and fix the
bugs that prevent heavy users from following it.


On 27 November 2013 07:27, Benoit Chesneau <> wrote:
> On Wed, Nov 27, 2013 at 8:26 AM, Benoit Chesneau <>wrote:
>> On Wed, Nov 27, 2013 at 8:14 AM, Alexander Shorin <>wrote:
>>> > Move attachments out of CouchDB: Work has begun to move the package
>>> tarballs out of
>>> > CouchDB and into Joyent's Manta service. Additionally, MaxCDN has
>>> generously offered to
>>> > provide CDN services for npm, once the tarballs are moved out of the
>>> registry database.
>>> > This will help improve delivery speed, while dramatically reducing the
>>> file system I/O load on
>>> > the CouchDB servers. Work is progressing slowly, because at each stage
>>> in the plan, we are
>>> > making sure that current replication users are minimally impacted.
>>> I wonder is it CouchDB non-optimal I/O and/or can 769 issue fix it?
>>> There is alpha-patch attached. May be it's good time to push it
>>> forward? What things are left for it?
>>> --
>>> ,,,^..^,,,
>> I would say a better API internally , I am also interrested to work on that
> also +1

View raw message