From dev-return-48391-archive-asf-public=cust-asf.ponee.io@couchdb.apache.org Thu Feb 28 10:32:31 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 10725180657 for ; Thu, 28 Feb 2019 11:32:30 +0100 (CET) Received: (qmail 72412 invoked by uid 500); 28 Feb 2019 10:32:30 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 72395 invoked by uid 99); 28 Feb 2019 10:32:30 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Feb 2019 10:32:30 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 82EAE182108 for ; Thu, 28 Feb 2019 10:32:29 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.973 X-Spam-Level: X-Spam-Status: No, score=0.973 tagged_above=-999 required=6.31 tests=[RCVD_IN_DNSWL_NONE=-0.0001, SPF_SOFTFAIL=0.972, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id MgSjkL5wf6qN for ; Thu, 28 Feb 2019 10:32:27 +0000 (UTC) Received: from monoceres.uberspace.de (monoceres.uberspace.de [95.143.172.184]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 7FCFE5FDEE for ; Thu, 28 Feb 2019 10:32:27 +0000 (UTC) Received: (qmail 1554 invoked from network); 28 Feb 2019 10:32:20 -0000 Received: from localhost (HELO ?IPv6:2a02:8106:1d:4a00:843:8af7:ce8c:7ae5?) (127.0.0.1) by monoceres.uberspace.de with SMTP; 28 Feb 2019 10:32:20 -0000 From: Jan Lehnardt Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.1 \(3445.101.1\)) Subject: Re: [DISCUSS] Attachment support in CouchDB with FDB Date: Thu, 28 Feb 2019 11:33:32 +0100 References: <0c9bbf4d-ca2f-434a-8816-8d0f356e0c44@www.fastmail.com> To: dev@couchdb.apache.org In-Reply-To: <0c9bbf4d-ca2f-434a-8816-8d0f356e0c44@www.fastmail.com> Message-Id: X-Mailer: Apple Mail (2.3445.101.1) Thanks for getting this started, Bob! In fear of derailing this right off the bat, is there a potential 4) = approach where on the CouchDB side there is a way to specify = =E2=80=9Cattachment backends=E2=80=9D, one of which could be 2), but = others could be =E2=80=9Cnode local file storage=E2=80=9D*, others could = be S3-API compatible, etc? *a bunch of heavy handwaving about how to ensure consistency and fault = tolerance here. * * * My hypothetical 4) could also be a later addition, and we=E2=80=99ll do = one of 1-3 first. * * * =46rom 1-3, I think 2 is most pragmatic in terms of keeping desirable = functionality, while limiting it so it can be useful in practice. I feel strongly about not dropping attachment support. While not ideal = in all cases, it is an extremely useful and reasonably popular feature. Best Jan =E2=80=94 > On 28. Feb 2019, at 11:22, Robert Newson wrote: >=20 > Hi All, >=20 > We've not yet discussed attachments in terms of the foundationdb work = so here's where we do that. >=20 > Today, CouchDB allows you to store large binary values, stored as a = series of much smaller chunks. These "attachments" cannot be indexed, = they can only be sent and received (you can fetch the whole thing or you = can fetch arbitrary subsets of them). >=20 > On the FDB side, we have a few constraints. A transaction cannot be = more than 10MB and cannot take more than 5 seconds. >=20 > Given that, there are a few paths to attachment support going forward; >=20 > 1) Drop native attachment support.=20 >=20 > I suspect this is not going to be a popular approach but it's worth = hearing a range of views. Instead of direct attachment support, a user = could store the URL to the large binary content and could simply fetch = that URL directly. >=20 > 2) Write attachments into FDB but with limits. >=20 > The next simplest is to write the attachments into FDB as a series of = key/value entries, where the key is {database_name, doc_id, = attachment_name, 0..N} and the value is a short byte array (say, 16K to = match current). The 0..N is just a counter such that we can do an fdb = range get / iterator to retrieve the attachment. An embellishment would = restore the http Range header options, if we still wanted that = (disclaimer: I implemented the Range thing many years ago, I'm happy to = drop support if no one really cares for it in 2019). >=20 > This would be subject to the 10mb and 5s limit, which is less that you = _can_ do today with attachments but not, in my opinion, any less that = people actually do (with some notable outliers like npm in the past). >=20 > 3) Full functionality >=20 > This would be the same as today. Attachments of arbitrary size (up to = the disk capacity of the fdb cluster). It would require some extra = cleverness to work over multiple txn transactions and in such a way that = an aborted upload doesn't leave partially uploaded data in fdb forever. = I have not sat down and designed this yet, hence I would very much like = to hear from the community as to which of these paths are sufficient. >=20 > --=20 > Robert Samuel Newson > rnewson@apache.org --=20 Professional Support for Apache CouchDB: https://neighbourhood.ie/couchdb-support/