couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From davisp <...@git.apache.org>
Subject [GitHub] couchdb-couch pull request: 2516 deduplicate attachements on compa...
Date Tue, 16 Dec 2014 21:53:46 GMT
Github user davisp commented on the pull request:

    https://github.com/apache/couchdb-couch/pull/24#issuecomment-67239293
  
    That looks pretty good but there's still an issue with the counting of active size. Unfortunately
I think the fix is going to require us to start using @strmpnk's new attachments module which
appears to introduce some new issues once we start using extended attributes.
    
    The issue is that when we update documents we don't know if the doc on disk is de-duplicated
or not. So if we go to edit a doc that has a de-duplciated attachment will suddenly start
double counting the bytes on disk which breaks our active size information. A test case would
be something like this:
    
        1. Create doc A with attachment
        2. Create doc B with same attachment that will be de-duped
        3. Compact the database
        4. Check our active size
        5. Modify doc B (in a way that keeps the de-duped attachment without copying it)
        6. Check that active size isn't increased by more than the attachment size
    
    I'd suggest making sure that the attachment is something like 64K so that its obvious
if we accidentally started double counting (modifying doc B should increase active size slightly
by definition).
    
    The fix here I think is to use the couch_att module's ability to store extended attributes
that shows if the attachment was deduplicated. Theoretically wrapping up some of the compaction
code will clean up that logic anyway so that's a good thing.
    
    Unfortunately it looks like we haven't yet started accounting for attachments with extended
attributes in the compactor. Our current attachment terms are those unwiedly 8-tuples. Anything
with extended attributes will turn that into a 2-tuple with the first element being the 8-tuple,
and the second element being a proplist of key/value pairs for the attachment. Theoretically
we should be able to use this to store if the attachment was de-duplicated at compaction time
but we'll have to make sure it works with compactions and upgrades which it appears hasn't
been completely vetted yet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message