couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From CGS <>
Subject Re: Database size seems off even after compaction runs.
Date Sat, 24 Dec 2011 01:39:17 GMT
1. Think of the service as a quantified stream of data and not as a 
continuous one. To switch from one db to another is just deviating the 
flux from one db to another in between two data transmission sequences. 
The actual implementation depends on your project. I don't know about 
your project, but just for the sake of the argument, let's consider two 
databases: B (at the back-end) and F (at the front-end). Also, let's say 
F is connected with another HTTP server (maybe it's me, but I am not 
relying only on CouchDB to respond to all HTTP requests). Let's reclaim 
the space from B firstly. I create a database BT and I am starting to 
transfer all the available documents (delete event for a document just 
makes it unavailable). Once I finish, I just "cut the pipe" in between B 
and F (stopping replication or whatever mechanism you may use to connect 
B with F) and "redirect the pipe" toward BT (starting replication or any 
other mechanism you use; for the replication I would add filter, but 
that's another story). You can do that in the reversed order 
(redirecting and after that cutting). Once the data flux is redirected, 
delete B and re-create it. That deletes the file from the harddisk and 
creates a new one. Secondly, to reclaim F, the same procedure, just that 
it is handled by the HTTP server (redirection page can be done even with 
a simple JavaScript command; all one needs to do is switch the old page 
to a temporary new one). If programmed correctly, the user wouldn't feel 
anything except for a slight delay in loading the page (redirection). 
Maybe I worked too much with YAWS and Erlang, but I usually create a 
simple application which checks the correctness of the data before 
injecting them into the database. The delay time is negligible (I use 
bulk operation which peaks higher than the volume of documents YAWS can 
send) and the switch can be done by a simple command sent to the TCP 
server within the Erlang application. That for the back-end database. 
For the front-end, the redirection it's just replacing the web page (no 
service interruption for YAWS - a bit more complex in case of using file 
cache). That would be my design for this particular example.

2. Would it? Transferring only the available documents from B to BT or 
from F to FT (from the example above), BT/FT would just use the space of 
the documents you want to keep (process done not through CouchDB 
replication, but a bit of handy work - or maybe using filtered 
replication, but I am not sure here). Once B/F is deleted, the file 
containing the database is deleted from the harddisk (the physical space 
where the file existed on the harddisk is emptied, meaning, the space 
can be reused by OS), so, no history is kept in this case if the 
database is created again. That for sure reclaims the space.

Of course, even for this example, there are limitations in using such a 
design. But it can be a starting point for you designing your project. 
If you want something simpler, then maybe you should ask the developers 
to add a "no history" option to CouchDB (it wouldn't be a bad idea and I 
am not ironic here).

But, as I mentioned before, the design depends on your project only and 
there is no general solution.

I hope this opinion will help you in your project.


On 12/24/2011 01:09 AM, Mark Hahn wrote:
>>   That means, you move the data from one to the other, filtering out the
> deleted documents, and when it's over, you switch to the newly constructed
> database, while the other gets emptied (deleted and re-created).
> 1) How exactly could you make this switch without interrupting service?
> 2) Wouldn't this procedure create the exact same eventual consistency
> problems that deleting documents in a db would?

View raw message