couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Klein <>
Subject Re: Using CouchDB for transferring user logs via the internet
Date Tue, 20 Jun 2017 21:12:47 GMT

commenting on individual topics inline, though i only have experience with
CouchDB 1.6, maybe CouchDB 2.0 behaves a bit different.

2017-06-20 20:34 GMT+02:00 Vladimir Kuznetsov <>:

> Now, I know in couchdb documents are not being really deleted, just marked
> as 'deleted' so the database will permanently grow. I've and option either
> to use periodic _purge(which I heard may be not safe especially in
> clustered environment) or implement this as monthly rotating database(which
> is more complex and I don't really want to follow this route).

I think rotating databases are not that more complex, but see below.

> My questions are:
> - Is this a valid use case for CouchDB? I want to use it primarily because
> of its good replication capabilities, especially in not reliable
> environments with some periods of being offline etc. Otherwise I'll have to
> write the whole set of data sync APIs with buffering, retries etc myself.

In our case, mobiles replicating to and from CouchDB, replication has
proven to be very very reliable, it just works.
If i had to implement that on my own it would have been much much worse.

> - Is this recommended practice to set up a chain of replication? Due to
> security considerations I want customer devices to replicate each to its
> own database in the cloud. Then I want those databases to replicate to the
> single central log database I'd subscribe for _changes. The reason is that
> it's easier for me to have a single source of _changes feed rather than
> multiple databases.

We do this, each user got his own database which we consider "outside", we
monitor these databases for changes and take appropriate actions. :)
On our server continuous replications from thousands of customer databases
to a central database occupied to many connections┬╣ and the performance
over all degraded, even if only "some" users where actually active. We now
listen to DB changes (_db_updates), start replications for the DB in
question and stop them again after a certain timeout, activity obviously
resets the timeout.

If you consider the single central log database only so you have a single
changes feed (you don't need centralized views etc.) I would skip the
central database and just process _all_docs (or a view only containing
unprocessed logentries) of any database an "updated" event was triggered on.

If you go that route, you are half way through rotating databases already,
your backend doesn't care anymore on which database a change is triggered

> - Is using _purge safe in my case? From the official doc I read "In
> clustered or replicated environments it is very difficult to guarantee that
> a particular purged document has been removed from all replicas". I don't
> think this is a problem for me as I primarily care about database size so
> it shouldn't be critical if some documents fail to delete.

_purge is the wrong tool for this job.
>From my understanding it's there as a last resort to get sensitive data out
of a DB.


[1] I think the main reason for this was actually the operating system, but
it was faster, easier and more future proof to implement the described
solution than to tune the OS to handle the connection, at least for me.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message