incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Owen Marshall <omarsh...@facilityone.com>
Subject Re: Document Timestamp On Replication
Date Wed, 04 May 2011 17:41:27 GMT
On 05/04/2011 12:36 PM, Jim Klo wrote:
> We need to be able to paginate through that range and be guaranteed that it's not going
to be modified via some update happening in another thread.
> 
>[...] I request a range of objects at 12:00pm... I need to be able to paginate through
that range probably until 12:05pm, potentially, without any updates between 12:00 and 12:05
effecting the result set.

Ah, *now* I see what you are going for.

First, pretend we are working only with one node. This is important :)

Let's assume that we are only concerned about filtering out new writes
-- that is, we want to be able to run a query and page around from 12:00
- 12:05, and not see a document added at 12:01.

So, include the document update time inside the doc, make that part (or
all!) of your view's key, and use endkey to ensure that you only see
documents before 12:00.

Handling updates becomes a bit more complex, but it's all based on how
your application needs to work.

One idea would be to, on update, store the old document inline. So you'd
go from:

{id: A, updatetime: 12:00, foo:bar} -> {id: A, updatetime: 12:01, foo:
baz, history: {id: A, updatetime: 12:00, foo: bar}}

Then your view just emits all updatetimes for documents *AND* the
updatetimes for all history values. You can use the same endkey filter
as before.

(That's assuming this level of tracking is needed for your program. It
may not be -- but that's up to you. And you could also decouple history
from individual documents. There are plenty of ways to skin this
particular cat.)

OK, now... this works fine for one node -- why won't it work for 2 or more?

Remember that replication is _not special_; the act of replicating a
document from n1->n2 is equivalent to PUTting/POSTing that document on
n1, then on n2.

So, if a user runs this view on N1 at 12:00, and changes/adds are
replicated in from N2 at 12:01, **it doesn't matter!** Those documents
will have an updatetime > 12:00, so they won't be seen.

Now, there are some possible issues that can come up with conflicts, but
you've got to handle those anyway if you want to use replication -- and
keeping track of when a document was replicated in won't help you with
that. As a matter of fact, that will have more pain points, compared to
well-understood conflicts.

Replication is just another insert/update stream. If you write your view
to work properly on one node (in your case, show only updates before a
given time), it's going to work when you replicate with other nodes.

See:
http://wiki.apache.org/couchdb/HTTP_view_API
http://wiki.apache.org/couchdb/View_collation

for more.

-- 
Owen Marshall
FacilityONE
omarshall@facilityone.com | (502) 805-2126


Mime
View raw message