incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Klo <jim....@sri.com>
Subject Re: Document Timestamp On Replication
Date Wed, 04 May 2011 16:36:47 GMT
I might be able to shed some light here, as I'm working w/ John on this project.

Essentially I believe ultimately we need to be able to do is build a view that is similar
to the _changes but filtered.

Basically we need a way to maintain 'local' transactional integrity.  So assuming our Couch
is busy receiving updates from other Couch nodes.  If we have a user that is querying a range
against Couch that, lets has would have 100,000 results.  We need to be able to paginate through
that range and be guaranteed that it's not going to be modified via some update happening
in another thread.

If you are familiar with OAI-PMH, essential we need to build flow control into the application,
but if I request a range of objects at 12:00pm... I need to be able to paginate through that
range probably until 12:05pm, potentially, without any updates between 12:00 and 12:05 effecting
the result set.

The idea about having the document with the local timestamp is that we would be able to create
a view by timestamp to query in this manner.

If you guys have alternate ideas on how we might achieve this - I think we'd be open to discussion.


Jim Klo
Senior Software Engineer
Center for Software Engineering
SRI International




On May 4, 2011, at 9:14 AM, Owen Marshall wrote:

> On 05/04/2011 11:29 AM, Poyau, John wrote:
>> -We want to keep track of the time that a document is added/updated in a source database
> 
> Then you definitely want an updated field per-document.
> 
> Implementing this varies with your needs. You could use a single
> timestamp that gets clobbered each time, if you don't need a huge
> auditing trail. You could also do a list of timestamps if it would prove
> helpful.
> 
> One other technique that I'm especially fond of is to store changes as
> attachments to each document. This gives you great audit trails -- who
> made what change when. You could go so far as to store the full document
> state before the change.
> 
> But if you don't need that level of auditing, a timestamp field is the
> way to go.
> 
>> -We want to keep track of the time that a document get replicated to a target databases
on replication.
> 
> Don't. Don't don't don't.
> 
> But because I hate it when the answer is "you're doing it wrong" and
> nothing else, some notes:
> 
> * You will definitely want to separate the replication time from the
> update time (as they clearly aren't the same thing.)
> 
> * Further, that *cannot* go in the document, clearly.
> 
> * You'd need at a minimum filtered/named replication to send the
> documents you want, and an update handler to put the "replicated time"
> in some other document.
> 
> Again though, you never answered the simple question of *why* you want
> to know this. Let me be clear: what you are trying to do adds a bunch of
> complexity to your documents, your replication, and your program. And
> I'm not sure why you want to do it so badly.
> 
> What problem do you think you are solving by storing the replicated time?
> 
> -- 
> Owen Marshall
> FacilityONE
> omarshall@facilityone.com | (502) 805-2126
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message