couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: The Blog
Date Mon, 09 Feb 2009 13:20:44 GMT

On 9 Feb 2009, at 13:49, Mister Donut wrote:

> I'll jump right in.
>
>> CouchDB won't allow you to "jump to page X", but if you look at
>> e.g. Google, it doesn't work either. [...]
>> But surrogate keys are considered harmful and I'd say (but that
>> really depends on the application), not very helpful.
>
> I guess I was assuming that CouchDB, due to its different nature, has
> a sophisticated solution for this. But apparently pagination is a
> problem that is really hard to solve.

CouchDB in its current form is very bare bones. Many of us are not
as experienced in CouchDB as in other RDBMS, just because CouchDB
hasn't been around that long. We've came a long way defining standard
patterns of how to solve common problems in CouchDB, but there's
a lot more to do.

Pagination lives and dies by being able to calculate which row lives on
which page. This works best with a surrogate index that is a sequence
over the rows. You can build a sequence on a distributed system, but
you are introducing a global queue that all requests to your system  
would
need to go through. You'd need to make that global queue fault tolerant
and able to hand all your load. This is tricky. Or you give up on that
and accept that sequences in a distributed systems are not feasible.
This is where pagination gets indeed gets hard. But if you work  
backwards
from the user experience, you can make a decent trade-off, see Google.


>> Can you elaborate on that? I don't quote get the "or duplicate data,
>> basically anything that needs to be the same as something else" bit.
>
> Well. Let's say you have a list of documents. You want to store some
> information about the newest document in a separate key (instead of a
> view, which might be slow? if you have too many).

Too many what? Views or documents? Views are not really slow once
the index is built and with incremental updates, not in production  
either.
Having many views is no problem either as they are evaluated on-read,
not on document-write (unlike traditional RDBMS column indexes).


> That isn't possible.
> Or let's say you have documents, and categories. And many, many, many
> of them. Again, the view to show the latest document might be too
> slow, so you want to save that information in a separate key. Not
> possible.

Once a view is built, it is rather quick to look up things. I'm  
suspecting here
that you assume that views are created on demand, based on user-input.
This is not something that would work except for very few documents and
you're advised to find a solution with predefined views.


>> A couple of things you can do with CouchDB replication (again, not   
>> saying,
>> that you can't do some of those with an RDBMS but it is getting  
>> harder
>> the further you move down the list): [...]
>
> Thank you for that list. I think, and like many other users,
> considering what I have read in blogs, seem to expect something else
> from CouchDB. I am not so sure where this is coming from.
>
> Check the Ruby thing a few mails down. How exactly is that
> implementation going to work without immediate consistency?

Paul's Stuffing is for people who want to get going with CouchDB quickly
in their rails environment. It is specifically not designed around all  
concepts
of CouchDB.


> Everyone
> seems to be going on about it being schema free, but you can just add
> a "param" field to any database and transparently (un)serialize and
> there you have it, schema-free.

Your alter-table statement locks your table (in MySQL). If you normalize
that out into a separate table, you add a JOIN which might end up not
being as fast as you like. Totally generic object behaviour abstractions
in SQL need something like 8 tables, there's no way this flies :)


> If you actually have a few nodes (with that implementation), it will
> break big, big time.

How? (Assuming you have a use-case in mind, can you explain that?)


> I think, possibly,
> with the "Cloud Hype", that I got into believing, that it will "just
> work". With anything that you throw at it. Like what Amazon SimpleDB
> tells you it would.

There is no magic bullet. Distributed programming is hard. :)


> Yes, Key/Value pairs are incredibly easy. MapReduce is amazing and
> intriguing. But handling the replication, won't it be so difficult
> that you end up with a Quasi-Mini-RDBMS anyway?

What is a quasi-mini-RDBMS? Of course, concepts and behaviour
will likely overlap, but there are a number of properties that draws
people to CouchDB. The REST API is one thing. JSON another.
Replication yet another and the Erlang core another, another.

Speaking of which, Erlang is pretty cool for multi-core systems
that are rather hard to program in other languages (yet again,
no silver bullet).


> Now I got far away from my original questions, but I guess that
> happens often in discussions.
>
> Basically, now: "Is it possible to handle the replication in such a
> way that you don't end up with a Mini-RDBMS anyway in the end?"

Again, can you wrap that into a concrete example, I don't quite get what
that mini-RDBMS is and how your understanding of replication ties
into that :)


> I would just, really really really, like to see an example that goes
> beyond schema-free. That handles replication. I think that would show
> where CouchDB shines, and where you'd fail with a RDBMS.

See the last three items on the list in the last mail. They are  
traditionally
not easy to build on top of an RDBMS in a practical or scalable manner.

Cheers
Jan
--


Mime
View raw message