couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Candler <B.Cand...@pobox.com>
Subject Re: Fail on a simple case on replication
Date Tue, 24 Feb 2009 12:39:56 GMT
On Tue, Feb 24, 2009 at 09:06:09AM +0100, Patrick Antivackis wrote:
> Oh and by the way, in a use case where there is only one database and you
> don't use compaction because you want to keep everything, well _rev is a
> revision that can be used to see the history of the document.

This is a good point. If you follow "accountants don't use erasers" then you
will never compact (and maybe you want a flag which prevents compaction).

However, you must then be prepared for your database to be a single file
which grows without bounds. If CouchDB wants to support this model, it would
be helpful if the data were stored in chunks which can be backed up
separately.

"Compaction" for saving space could be achieved by rewriting the database,
but keeping diffs for earlier revisions. At this point you would end up with
something roughly like git.

On a random tangent: has anyone considered a CouchDB-like system where
documents are raw blobs, rather than JSON? ISTM that:

- it would save a lot of conversion between Erlang terms and JSON
- it would remove the second-class nature of attachments
- it would allow structured data to be stored in arbitary formats (e.g. XML)
- it would allow map/reduce to work on binary data (e.g. use a map function
  to make thumbnails of all your jpegs)
- you could still use JSON quite happily, e.g.

  function map(type, data) {
    if (type == "application/json") {
      doc = evalcx(data);
      ... continue as normal
    }
  }

I guess some of the APIs would become a bit more awkward though. For
example, bulk document insert would probably become MIME multipart.

In principle, I think you could get today's CouchDB as a thin layer on top
of this. However, "attachments" do have interesting special semantics (e.g.
deleting a document deletes all its attachments) which might need some
parent/child relationship between documents to maintain. Having that
relationship between documents in a more general form could also be useful.

Just thinking out loud.

Regards,

Brian.

Mime
View raw message