incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leo Simons <>
Subject Re: CouchDB and Census software
Date Tue, 27 Oct 2009 22:36:34 GMT
Hey Fidel,

On Tue, Oct 27, 2009 at 5:04 PM, Fidel Viegas <> wrote:
> [snip explanation of use case of replacing lots of spreadsheets with CouchDB and replication]
> 1) Power isn't stable across the whole country. Even in the capitol we
> experience power cuts.
> 2) Communications are really bad. We have mobile communications, but
> sometimes they don't work. Even the normal Cable and DSL ones don't
> work properly. The most stable ones are VSAT Internet, which is very
> expensive, but will have to be used in some sites.
> The replication is uni-directional. That is, from district to province
> and from province to nationwide db.
> Would you suggest using CouchDB for a system like this?

Yes, its an excellent fit.

Besides the use of HTTP and replication, the free-form document format
mirrors the evolvability of excel spreadsheets pretty well. You should
be able to supporting adding and removing of columns pretty well.

> And if yes, how would you tackle it? What would you suggest?

Depending on the size of these spreadsheets, you may want to make a
separate CouchDB database per province or per district. Writing the
map/reduce jobs will be a bit more tedious, but the data will be
easier to manage and replicate around.

For your first version I would install a single CouchDB on what will
eventually be your main/central server; it can serve all your
databases. Get the best possible connectivity to that server. Then,
write a little (web-based?) UI on top that allows submitting a
spreadsheet directly into this database. Then, add a function to
export the data out again.

Importing/exporting excel can be a bit tricky to get right in a
webapp; the easiest and most robust approach actually is if you have
users select all the data in the spreadsheet, copy it to the
clipboard, and then paste it into a <textarea> on a web page. In
particular that will help a lot with character encoding conversions if
your user is on windows, the spreadsheet is in excel, and the browser
is IE or firefox.

This first system could then go into production. It will give you with
a smooth migration path for those people that do have enough internet
access to connect to your central server using their web browser.

Obviously you can make prettier UIs than an excel import/export once
you have the data in CouchDB and people will soon stop using excel :).

You can then start to set up the "slave" installations in those places
that have the bad connectivity. I would suggest using push-based
replication from the slave to the master. You can use a cron job to
trigger this replication. Set up this way, it is the most robust
against bad connections.

Think about bi-directional replication too though. You could add
pull-based replication from the master site back to the slave. if
bandwidth allows it, the local sites could have a full copy of the
data which may be nice when the network connection is down. Or, you
might clean up the data on the central server and then push an update
back to the slave site. Due to the nature of your data structures, its
not likely that you ever really have to deal with document conflicts.

For locations where there is _really_ bad connectivity, e-mail
sometimes still works ok after HTTP breaks down (at least that was the
case in parts Africa a few years ago). If you find you have that
problem, building a simple e-mail interface that allows submitting and
retrieving CSV data can work well.

Hope this helps :)



View raw message