incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Newson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COUCHDB-1243) Compact and copy feature that resets changes
Date Tue, 09 Aug 2011 08:10:27 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081508#comment-13081508
] 

Robert Newson commented on COUCHDB-1243:
----------------------------------------

_purge is really for the "oops, I just put my admin password in a document" scenario. It's
not well tested, has known and unresolved bugs, and obviously ruins eventual consistency.
I'd rather see it removed than encouraged, but I think it's important for the narrow use case
I just mentioned.

We only remember the _rev's for the last 1000 updates to a document, so there is a cap (albeit
a generous one) on how much is retained. When you say '6+ million changes' are these updates
to existing documents or are you deleting documents and making new ones?

If the latter, then you could consider the temporal database idea, which is often suggested
when using couchdb as a message queue: Use a database per time interval (say, weekly). When
the database is empty (i.e, only has deleted documents), you can delete the db entirely.

I'll finish with saying that CouchDB's retention of information about deleted documents and
old revisions is central to CouchDB, if it's working so strongly against you, then I don't
think it's the right database solution for your problem.



> Compact and copy feature that resets changes
> --------------------------------------------
>
>                 Key: COUCHDB-1243
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1243
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core
>    Affects Versions: 1.0.1, 1.1
>         Environment: Ubuntu, but not important
>            Reporter: Henrik Hofmeister
>              Labels: cleanup, compaction
>         Attachments: dump_load.php
>
>
> After running db and view compaction on a 70K doc db with 6+ mio. changes - it takes
up 0.8 GB. If copying the same documents to a new db (get and bulk insert) - the same date
with 70K changes (only the inserts) takes up 40 mb. That is a huge difference. Has been verified
on 2 db's that the difference is more than 65 times the size of data.
> A "Compact and copy" feature that copies only documents, and resets the changes for at
db would be very nice to try and limit the disk usage a little bit. (Our current test environment
takes up nearly 100 GB... )
> I've attached the dump load php script for your convenience.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message