couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Cottlehuber (Commented) (JIRA)" <>
Subject [jira] [Commented] (COUCHDB-769) Store large attachments external to the .couch file
Date Wed, 01 Feb 2012 16:12:59 GMT


Dave Cottlehuber commented on COUCHDB-769:

This seems really useful, especially when coupled with CORS in future.

## Possible use cases

#1 I want a lean and mean couch, but attachments are on the server' FS
- external API is unchanged
- we redirect "large" attachments to file system per rnewson's approach
- couch still streams/returns the file
- should be per-db configurable as to where the bits are put
- may need some form of hashing/buckets to spread out across a directory
- sanitise path & filename to avoid security holes (uridecode, trim to basename)
- if attachment is not present we'd need to issue something like
  417 Expectation Failed (assumes that we are acting as a proxy here)
  A 404 doesn't feel right and also using 417 would make tracking these down
  in logs very easy. 500 or 501 would be OK too.

#2 Host these attachments somewhere else, just store metadata and redirect
- external request API is unchanged *however* the response would be a redirect
- POST would also need to change
- instead issues redirect - 302 Found to ensure future requests still go via couch
- no sanitisation of path reqd
- might need to be per-db or per-server configurable to ensure public couches
  don't become easy targets for spam referrers
- couch doesn't stream/return the file itself

   "_id" : "redirect302",
   "meta": "data",
   "_attachments" : {
      "fox.png" : {
         "content-type" : "image/png",
         "uri" : ""

## Per-server configuration.

For #1 and #2 then we would have:

redirection_handler = true ;  <doc>._attachments.<name> redirects to ...uri
filestore_handler  = true ; enable storing large attachments on filesystem
filestore_threshold = 1048576 ; size in Bytes above which
filestore_dir = /var/lib/couchdb/attachments/ ; each couch has a named subdir

## Considerations.

I think we should preserve the current _attachments structure and potential
user-provided metadata, even if the actual attachments are stored elsewhere.
MD5 and similar checks should be still be feasible using this.

In #1 it should not be possible to exploit the server to expose data by fiddling
with pathnames and filenames.

I would imagine in a BigCouch scenario that #1 presents some further
challenges. Using #2 and "uri": "file://nfsmount/somefile" won't work
as it leaks server implementation and may be exploitable.

Also #2 might also be useful for people running their infrastructure within a cloud
provider like AWS S3, and they might want to serve their attachments using
couch as a proxy, rather than expose the external URI.

> Store large attachments external to the .couch file
> ---------------------------------------------------
>                 Key: COUCHDB-769
>                 URL:
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core
>            Reporter: Robert Newson
>         Attachments: external_attachments_alpha.patch
> For attachment-heavy applications storing the attachments in separate files significantly
eases compaction problems.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message