incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Cottlehuber (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COUCHDB-769) Store large attachments external to the .couch file
Date Wed, 01 Feb 2012 16:12:59 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197926#comment-13197926
] 

Dave Cottlehuber commented on COUCHDB-769:
------------------------------------------

This seems really useful, especially when coupled with CORS in future.

## Possible use cases

#1 I want a lean and mean couch, but attachments are on the server' FS
- external API is unchanged
- we redirect "large" attachments to file system per rnewson's approach
- couch still streams/returns the file
- should be per-db configurable as to where the bits are put
- may need some form of hashing/buckets to spread out across a directory
  structure
- sanitise path & filename to avoid security holes (uridecode, trim to basename)
- if attachment is not present we'd need to issue something like
  417 Expectation Failed (assumes that we are acting as a proxy here)
  A 404 doesn't feel right and also using 417 would make tracking these down
  in logs very easy. 500 or 501 would be OK too.

#2 Host these attachments somewhere else, just store metadata and redirect
- external request API is unchanged *however* the response would be a redirect
- POST would also need to change
- instead issues redirect - 302 Found to ensure future requests still go via couch
- no sanitisation of path reqd
- might need to be per-db or per-server configurable to ensure public couches
  don't become easy targets for spam referrers
- couch doesn't stream/return the file itself

POST 
{
   "_id" : "redirect302",
   "meta": "data",
   "_attachments" : {
      "fox.png" : {
         "content-type" : "image/png",
         "uri" : "http://your.bucket.s3.amazonaws.com/fox.png"
         }
   }
}

## Per-server configuration.

For #1 and #2 then we would have:

[attachments]
redirection_handler = true ;  <doc>._attachments.<name> redirects to ...uri
filestore_handler  = true ; enable storing large attachments on filesystem
filestore_threshold = 1048576 ; size in Bytes above which
filestore_dir = /var/lib/couchdb/attachments/ ; each couch has a named subdir

## Considerations.

I think we should preserve the current _attachments structure and potential
user-provided metadata, even if the actual attachments are stored elsewhere.
MD5 and similar checks should be still be feasible using this.

In #1 it should not be possible to exploit the server to expose data by fiddling
with pathnames and filenames.

I would imagine in a BigCouch scenario that #1 presents some further
challenges. Using #2 and "uri": "file://nfsmount/somefile" won't work
as it leaks server implementation and may be exploitable.

Also #2 might also be useful for people running their infrastructure within a cloud
provider like AWS S3, and they might want to serve their attachments using
couch as a proxy, rather than expose the external URI.

                
> Store large attachments external to the .couch file
> ---------------------------------------------------
>
>                 Key: COUCHDB-769
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-769
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core
>            Reporter: Robert Newson
>         Attachments: external_attachments_alpha.patch
>
>
> For attachment-heavy applications storing the attachments in separate files significantly
eases compaction problems.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message