couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Marca <>
Subject Re: Can I guarantee uniqueness in a field without using _id?
Date Tue, 13 Jan 2009 01:12:56 GMT
On Mon, Jan 12, 2009 at 03:05:32PM -0800, Sunny Hirai wrote:
> >
> > You have just identified the problem and solution that CouchDB solves
> > with using a traditional RDBMS in a distributed environment. Check out
> > the discussion on the CAP hypothesis (Others call it a theorem, I'm
> > not convinced, but the basic idea is there).
> >
> > Or more succinctly, guaranteed uniqueness requires global state,
> > global state does not scale.
> >
> > HTH,
> > Paul Davis
> >
> Yeah, that's what I thought but I was hoping for something in between.
> That is, all I want to do is guarantee uniqueness on one or a set of fields.
> It seems to me that it would not require locks on reads and most of eventual
> consistency would be okay except for when dealing with an INSERT or an
> UPDATE that changes the unique field. All other cases could work just like
> before. It would only require a special locked lookup on the unique field or
> field set which could be optimized.
> I know that CouchDB is not a solution to all problems (nor do I think it
> should be) but it seems that this would be common for most web applications
> which is a major CouchDB target. For example, almost any website needs
> unique user_names. They also need unique names for pages in a wiki. Blogs
> that use named pages would also have this problem.

> In other words, this seems like an issue that happens so often in the web
> computing world that I wonder if it should be supported in CouchDB.

But ... you pointed out that CouchDB can already provide uniqueness,
as long as you use _id for this.  In my application wherever I need to
guarantee a unique document field (in my case, data from a particular
site at a particular time) I just concatenate those fields with some
known field separator to make that the _id of the document.

You are restricting the use of _id for this "feature" due to your
design decision of using just integer ids.  But I don't think you can
also complain that there is no way to guarantee the uniqueness of an
arbitrary (set of) field(s).  There is, you're just not using it.
Again, I haven't been using couchdb for more than two months, but I'd
guess the developers are not going to see a need to allow for
*another* guaranteed unique field.

But before you think this post is just a flame or something, I *agree*
with you that some way of asking for uniqueness is a perfectly good
feature to have.  On the data upload side, I can see flagging unique
data using something trivial like '_unique/foo':{'key1':value,
'key2':[...],...}  to indicate unique data to the engine in the input
JSON (following the design document convention).  It always strikes me
as a bit of a hack to overload the _id just to use its uniqueness
features.  But given that one *can* get unique fields, I'd guess there
are other things for the developers to solve first.

Back to the problem at hand, perhaps you could translate whatever you
need to be guaranteed unique into integers in a one to one, reversible
mapping function.  The function would only have to be more efficient
than writing wrapper code to handle non-integer _id values.  Use
character codes or something.

Just my two cents,

> Of course, the argument is to push that global state into the application
> layer and store it elsewhere; however, the question is whether that
> complexity would better service the developer if it was (a) in CouchDB or
> (b) in the application itself.
> Since it is a common case (as opposed to an edge case), perhaps there could
> be some discussion of putting it into CouchDB.
> Seriously though, would this not affect 100% of websites with a user base?
> One solution, I suppose, is to store the user data in an SQL database, but I
> prefer having as much as possible in CouchDB due to its flexible schemas.

This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

View raw message