couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: [DISCUSS] Attachment support in CouchDB with FDB
Date Thu, 28 Feb 2019 10:33:32 GMT
Thanks for getting this started, Bob!

In fear of derailing this right off the bat, is there a potential 4) approach where on the
CouchDB side there is a way to specify “attachment backends”, one of which could be 2),
but others could be “node local file storage”*, others could be S3-API compatible, etc?

*a bunch of heavy handwaving about how to ensure consistency and fault tolerance here.

* * *

My hypothetical 4) could also be a later addition, and we’ll do one of 1-3 first.


* * *

From 1-3, I think 2 is most pragmatic in terms of keeping desirable functionality, while limiting
it so it can be useful in practice.

I feel strongly about not dropping attachment support. While not ideal in all cases, it is
an extremely useful and reasonably popular feature.

Best
Jan
—

> On 28. Feb 2019, at 11:22, Robert Newson <rnewson@apache.org> wrote:
> 
> Hi All,
> 
> We've not yet discussed attachments in terms of the foundationdb work so here's where
we do that.
> 
> Today, CouchDB allows you to store large binary values, stored as a series of much smaller
chunks. These "attachments" cannot be indexed, they can only be sent and received (you can
fetch the whole thing or you can fetch arbitrary subsets of them).
> 
> On the FDB side, we have a few constraints. A transaction cannot be more than 10MB and
cannot take more than 5 seconds.
> 
> Given that, there are a few paths to attachment support going forward;
> 
> 1) Drop native attachment support. 
> 
> I suspect this is not going to be a popular approach but it's worth hearing a range of
views. Instead of direct attachment support, a user could store the URL to the large binary
content and could simply fetch that URL directly.
> 
> 2) Write attachments into FDB but with limits.
> 
> The next simplest is to write the attachments into FDB as a series of key/value entries,
where the key is {database_name, doc_id, attachment_name, 0..N} and the value is a short byte
array (say, 16K to match current). The 0..N is just a counter such that we can do an fdb range
get / iterator to retrieve the attachment. An embellishment would restore the http Range header
options, if we still wanted that (disclaimer: I implemented the Range thing many years ago,
I'm happy to drop support if no one really cares for it in 2019).
> 
> This would be subject to the 10mb and 5s limit, which is less that you _can_ do today
with attachments but not, in my opinion, any less that people actually do (with some notable
outliers like npm in the past).
> 
> 3) Full functionality
> 
> This would be the same as today. Attachments of arbitrary size (up to the disk capacity
of the fdb cluster). It would require some extra cleverness to work over multiple txn transactions
and in such a way that an aborted upload doesn't leave partially uploaded data in fdb forever.
I have not sat down and designed this yet, hence I would very much like to hear from the community
as to which of these paths are sufficient.
> 
> -- 
>  Robert Samuel Newson
>  rnewson@apache.org

-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/


Mime
View raw message