couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: The Blog
Date Tue, 10 Feb 2009 06:03:54 GMT
You're doing a great job distilling the greater discussion directly
into a good overview on using CouchDB. Keep up the good work.

On Mon, Feb 9, 2009 at 11:50 PM, Mister Donut <lady.donut@gmail.com> wrote:
> Wouks, too many replies!
>
> I learned a lot by just reading. I will just reply to a few comments.
>
>> To be honest, I think saying RDBMS and CouchDB are for different
>> solutions is just you guys being nice. I think that any application
>> would benefit from using the CouchDB model and only in very specific,
>> very demanding cases an RDBMS would be better. I can't think of any
>> examples though.
>
> Yes, see, this is what I started to believe as well. But this thread
> showed me that this idea is wrong. RDBMS are there and were written
> for a reason. CouchDB solves a different problem. It's not just a new
> storage layer that can be plugged into any existing (RDBMS) database
> abstraction layer and then "it just works".
>
> http://couchdb.apache.org/docs/intro.html
>

My position is that most people using an RDBMS don't actually need
one. My general divining rod is, "If you're using a 'web framework'
and an 'Object Relational Mapper', then chances are you're Doing It
Wrong &trade;." But your general outline is correct.

> "An object-oriented database. Or more specifically, meant to function
> as a seamless persistence layer for an OO programming language."
>

I just wanted to point out that this quote is pulled from the very
prominent "What CouchDB is Not" section. The context here made that a
bit hard to follow.

>> I'll help you: I think it would be easier to create a wiki with
>> CouchDB than with an RDBMS. It is possible in both but CouchDB just
>> makes it easier. I suppose we'd have to ask the http://couch.it guys
>> to know if that's true.
>
> Well. How does CouchDB make it easier? I think I'd be easier on some
> parts, and harder on other parts. As I said, I don't think (anymore)
> that CouchDB is supposed to replace a RDBMS, but instead solve a
> different problem.
>
> As soon as you need to scale horizontally, replication comes into
> play, think Wikipedia. Because of the eventual consistency, you might
> have many different versions of pages "live". Just think what happens
> when users start to edit and save old versions. This is a very
> interesting read
>
> http://www.facebook.com/note.php?note_id=23844338919

That article is definitely a good read for anyone thinking about
issues in replication. But make sure you understand the differences
between CouchDB style replication and MySQL style replication. MySQL
is a (AFAIK, only read about it etc) log replay style replication
system. Interruptions in the log replay are very disruptive. OTOH,
CouchDB style replication is incremental and doesn't require a
constant un-interruptible connection.

>
> About cache invalidation. I just don't think that, as soon as you are
> forced to use replication, which is the whole point of CouchDB?
> Clouds? Scale horizontally? you can actually build a typical web
> application (wiki, forum, blog) that doesn't give the user a
> consistent experience.
>

If I understand correctly, you're saying that replication doesn't
allow building a webapp that provides a consistent experience.
Assuming that's what you meant, I don't think that's quite right. Just
like the Facebook article, there are strategies you can use for sticky
sessions etc to make sure that users are reading from the server that
accepted their write. Any time you end up with noticeable propagation
delays you'll run into interesting problem spaces like this. And by no
means is replication 'the whole point' of CouchDB.

> Now, if you build something like Antony Blakey (#9 in this thread),
> that seems like a really great idea on how to use CouchDB.
>

Lots of desktop CouchDB installs is definitely an exciting mode of operation.

>> I know it would be fairly simple to have an "accounts" array field on a JSON
>> user-account document - that way no single "enities" account could be
>> changed by more than one write at the same time... seems rediculously simple
>> - but is there a case where this could fail?
>
> Well, isn't the standard example:
>
> Person A only has $500.
>
> 1 Check A's account: $500
> 2 Set A's account: $0
> 3 Check B's account: $1000
> 4 Set B's account: $1500
>
> Now, if at any time between 1 and 2 or 3 and 4 you modify A's or B's
> account, you have lost.
>
> This is where it could fail? Assuming the four actions are not sent in
> the same request.
>
> Again. This is RDBMS thinking. In CouchDB, the balance should probably
> be a view. But there is still no way to enforce that you have enough
> money on your account before you can withdraw. Is there?
>
> Check Balance
> <- Other Instance Withdraws
> Withdraw
>
>> Things that CouchDB is better at:
>> The interweb.
>> Things that an RDBMS is better at:
>> Huge amounts of business logic. As in the Oracle install running your
>> favorite hospital. Think along the lines of 10's and 100's of
>> thousands of lines of app logic in the DB itself.
>
> You know, I am trying really hard, but these comments just contribute
> absolutely nothing to the discussion.
>

I apologize that my humor overshadowed the message. Distilling a bit I
would rephrase it as, "If the you require consistency above
availability and partition tolerance, CouchDB may not be the right
hammer for your nail."

More thoroughly, CouchDB is good in terms of the model of the web
itself. The World Wide Web is not consistent. Documents are not always
valid. Hyperlinks can and do break regularly. There are a huge amount
of errors in the system. And yet it chugs on merrily.

On the opposite end of the spectrum, we have extremely large RDBMS
installs on huge iron. IIRC, I read an article that the 37signals crew
just bought a 32 GiB machine to scale up Basecamp. Single machine
running a highly available and highly consistent RDBMS. Now, if you
tried splitting that single database over multiple physical nodes
(which is what the article I read was arguing against) the whole
system would require many man hours of systems engineering or a huge
rewrite of the base application logic.

Another thought that just occurred to me. Another way of describing
the difference is that in CouchDB the data is important. In an RDBMS,
it's the relations that are important (or the focus at least).

>> You can do that with Map/Reduce.
>> Create a view that gets all the comments and get them with limit=0,
>> there's your counter.
>
> No, you cannot. *Variable* criteria. A Map/Reduce is a fixed criteria.
> Also, a counter in the most abstract meaning. The only way to count
> something in CouchDB is to add every item to the database and then use
> a view. There is no +=. And there is no way to aggregate the count
> into a single key.

Yes, you can. Just not in the way you're used to thinking. A
Map/Reduce view is a fixed *mapping* from documents to a sorted
key/value space. The important part to note is the word 'sorted'. Its
precisely this sorting of your emitted key that allows you to select
specific records based on variable criteria. In CouchDB queries, you
spend time thinking in advance about how to get things to sort into an
order from which you can select a contiguous slice.

Also, you may count things. Patrick's original example gave you a
quick way to count a type of document. The reduce implementation
*defaults* to producing a single output value. If you emitted the
values [-1, 10, 3, -5] for any set of keys, and your reduce function
was simple "return sum(values)" you would get an output value of 7.

> Patrick, if you aren't trying, then don't. There are enough people who
> actually try.
>

Patrick was trying to help and was correct. Being rude after
misunderstanding his comments will not garner you much good will.

HTH,
Paul Davis

Mime
View raw message