incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fidel Viegas <>
Subject Re: CouchDB and Census software
Date Wed, 28 Oct 2009 08:01:43 GMT
Hi Leo,

First of foremost, thanks for your reply.

> Yes, its an excellent fit.
> Besides the use of HTTP and replication, the free-form document format
> mirrors the evolvability of excel spreadsheets pretty well. You should
> be able to supporting adding and removing of columns pretty well.

I was reading more about it and yes, I think it is an excellent fit.
Each survey is a self-contained document, which sort of fits the
CouchDB document model.

There is one thing though, that I think takes quite considerable
space. For each document we insert, we will have space for the field
names, which makes the the database grow faster than an RDBMS
counterpart, for instance. Nonetheless, I really like the way CouchDB
works and I am pretty excited to work with it on this project.

> Depending on the size of these spreadsheets, you may want to make a
> separate CouchDB database per province or per district. Writing the
> map/reduce jobs will be a bit more tedious, but the data will be
> easier to manage and replicate around.

After reading a few more chapters of CouchDB The Definitive Guide, I
came to the same conclusion. Each province is independent of each
other, and each district is also independent. All they need is to be
able to generate reports districtwise, provincewise and nationwide.
The district does not need to know about other districts, and same
goes for the provinces. But provinces will gather data from districts
and the central db from provinces.

> For your first version I would install a single CouchDB on what will
> eventually be your main/central server; it can serve all your
> databases. Get the best possible connectivity to that server. Then,
> write a little (web-based?) UI on top that allows submitting a
> spreadsheet directly into this database. Then, add a function to
> export the data out again.
> Importing/exporting excel can be a bit tricky to get right in a
> webapp; the easiest and most robust approach actually is if you have
> users select all the data in the spreadsheet, copy it to the
> clipboard, and then paste it into a <textarea> on a web page. In
> particular that will help a lot with character encoding conversions if
> your user is on windows, the spreadsheet is in excel, and the browser
> is IE or firefox.

This is actually a good idea. I think I will start with a main
central/db. Lucky of me, I will not need to feed the application with
spreadsheets. This is a project that was implemented in another
African country, and will start from scratch. They normally use Excell
and are all excited about it, thinking that Excell is a panacea.
Usually, these are old guys that haven't really got contact with
Clipper or recent RDBMSs. The are usually skeptic about moving away
from Excell, and they aren't that technology aware. The consultant I
am working with has given me these spreadsheets to analyse them, but
we will start something from scratch, which is good as I can start
something from scratch using the CouchDB data model.

> This first system could then go into production. It will give you with
> a smooth migration path for those people that do have enough internet
> access to connect to your central server using their web browser.

Now, this is where the problem comes. They don't use anything right
now. We will implement something for them. For gathering data in the
townships and villages that will then be fed into the district
database, I am thinking of using mobile units. Perhaps PDAs or
Netbooks. These will need an application to gather the data and then
feed them into the district database. The district database, then
feeds the province db and the national db. I was saying that the
feeding would be from district to province to national db, but after
some thought, they will not gather any data in terms of province. The
data will be all gathered districtwise, this in turn will feed the
province for analysis. Unless we summarize the data from the province
and feed it to the national db.

This is the first time I work on a census like application, so I don't
really know if that makes any sense. Should we summarize the data to
be fed to the national db, or should we feed the main data? Perhaps
feeding the main data will allow the national analysis have more data
to work on. Does this make any sense?

> Obviously you can make prettier UIs than an excel import/export once
> you have the data in CouchDB and people will soon stop using excel :).

That's the idea. To make prettier UI.

> You can then start to set up the "slave" installations in those places
> that have the bad connectivity. I would suggest using push-based
> replication from the slave to the master. You can use a cron job to
> trigger this replication. Set up this way, it is the most robust
> against bad connections.

Thanks for the tip. It makes sense.

> Think about bi-directional replication too though. You could add
> pull-based replication from the master site back to the slave. if
> bandwidth allows it, the local sites could have a full copy of the
> data which may be nice when the network connection is down. Or, you
> might clean up the data on the central server and then push an update
> back to the slave site. Due to the nature of your data structures, its
> not likely that you ever really have to deal with document conflicts.

Like I said above, teh districts don't need data from other districts.
They only need to feed the provincial database for analysis, and they
will do analysis based on their district. Maybe I am not getting the

> For locations where there is _really_ bad connectivity, e-mail
> sometimes still works ok after HTTP breaks down (at least that was the
> case in parts Africa a few years ago). If you find you have that
> problem, building a simple e-mail interface that allows submitting and
> retrieving CSV data can work well.

It is still the case. Connectivity is really bad across the country.
You can work, but we have been experiencing quite some problem from
the last 4 months. The most reliable one is VSAT, which is very, very

> Hope this helps :)

Yes, it did help quite a lot. I have a different picture of what I had
thought before. I think I will start with you first suggestion of
creating a central db first, and then work from there.

Thanks a lot for your input.


View raw message