couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stéphane Alnet (JIRA) <j...@apache.org>
Subject [jira] [Commented] (COUCHDB-1893) Allow replication filters to meaningfully apply to deleted documents
Date Mon, 23 Sep 2013 09:24:09 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774411#comment-13774411
] 

Stéphane Alnet commented on COUCHDB-1893:
-----------------------------------------

> `old_doc` may not exist after compaction

Argh, I mostly use continuous replication so didn't think about that.

> Instead of this, have you thought about adding very trivial logic to your filter functions
to accept or discard documents with `_deleted:true` if some special query argument or header
exists?

Since the query arguments/headers are shared for all documents I don't see how this can be
used to decide to filter some deleted documents and not others? (Besides having the HTTP client
pre-compute a list of IDs that need replicated, but then the replication filter is almost
useless.)

Typically I provide query arguments to replication filters and match those arguments to fields
in the document body. So my filter function might look like this:

  [ Quoting from https://github.com/shimaore/ccnq3/blob/master/applications/host/couchapps/main.coffee#L46
]

  ddoc.filters.local_rules = p_fun (doc,req) ->
    # Always replicate deletions
    if doc._deleted? and doc._deleted
      return true
    # Only replicate provisioning documents.
    if not doc.type?
      return false
    if doc.type is 'rule'
      return doc.sip_domain_name is req.query.sip_domain_name
    if doc._id.match /^_design/
      return false
    return true

(In this example all deletions get replicated out of a very large database, while the normal
record set is pretty small.)


Trying to summarize:
- We could expand the filter API as I mentioned but this would break on compaction.
- We could expand the compaction to keep N documents instead of only the last one; but then
we're asking for a double change in APIs.
- Use PUT+`_deleted:true` and keep simple filters in place; deletions do not occur if the
fields are not present in the deleted document; DELETE command should not be used in those
cases.

For now I'll document the workaround and why it is our current best option on the Wiki.
                
> Allow replication filters to meaningfully apply to deleted documents
> --------------------------------------------------------------------
>
>                 Key: COUCHDB-1893
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1893
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: JavaScript View Server
>            Reporter: Stéphane Alnet
>
> A document that is deleted using the DELETE command will be presented to a replication
filter as an empty record with only a `_deleted:true` field. A replication filter can then
only use the document ID to decide whether or not to propagate the deletion; in most cases
this is not sufficient, and one may have to pass along deletion documents for IDs that would
not have been replicated by the filter.
> This might lead to document IDs being leaked to the target database, which might be undesirable;
more importantly if the goal of filtering was to build a smaller subset of the source database
(for example to replicate a very large database to a device that has smaller storage space),
those deletion documents might overfill the database (they never get compacted).
> I had somewhat documented this issue on the Wiki (http://wiki.apache.org/couchdb/Replication#Filtered_Replication)
a while back but never got to add it to JIRA.
> Dave Cottlehuber on the PouchDB list suggested to use PUT with a `_deleted:true` field
to work around the problem (the PUT body can then contain data sufficient to enable the filter
to work). However we're still stuck in case DELETE was used instead.
> My suggestion is to expand the replication filter API to add an optional third argument
>     filter(doc,req,old_doc)
> where old_doc if present references the version of the document that will get deleted.
It is then up to the filter to use the _deleted flag in `doc` and the values in `old_doc`.
> (It might be useful/meaningful/easier to add old_doc in all cases; at this point I'm
only suggesting to add it in the case doc contains a _deleted field.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message