incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Leddy <m...@loop.com.br>
Subject Re: Document Timestamp On Replication
Date Wed, 04 May 2011 17:35:08 GMT
What you are describing could be resolved with a feature (that I believe
does not exist).

If you could supply a database sequence number when querying a view
ie. return the results from the view when the database sequence number
was "x" then the MVCC guarantees of couchdb would guarantee exactly 
what you want.

Databases and views being append only pretty much resolves everything
(unless a compact completes while you are still holding on to
the sequence number). 

Regards,

Mike


On Wed, 2011-05-04 at 09:36 -0700, Jim Klo wrote:
> I might be able to shed some light here, as I'm working w/ John on this project.
> 
> Essentially I believe ultimately we need to be able to do is build a view that is similar
to the _changes but filtered.
> 
> Basically we need a way to maintain 'local' transactional integrity.  So assuming our
Couch is busy receiving updates from other Couch nodes.  If we have a user that is querying
a range against Couch that, lets has would have 100,000 results.  We need to be able to paginate
through that range and be guaranteed that it's not going to be modified via some update happening
in another thread.
> 
> If you are familiar with OAI-PMH, essential we need to build flow control into the application,
but if I request a range of objects at 12:00pm... I need to be able to paginate through that
range probably until 12:05pm, potentially, without any updates between 12:00 and 12:05 effecting
the result set.
> 
> The idea about having the document with the local timestamp is that we would be able
to create a view by timestamp to query in this manner.
> 
> If you guys have alternate ideas on how we might achieve this - I think we'd be open
to discussion.
> 
> 
> Jim Klo
> Senior Software Engineer
> Center for Software Engineering
> SRI International
> 
> 
> 
> 
> On May 4, 2011, at 9:14 AM, Owen Marshall wrote:
> 
> > On 05/04/2011 11:29 AM, Poyau, John wrote:
> >> -We want to keep track of the time that a document is added/updated in a source
database
> > 
> > Then you definitely want an updated field per-document.
> > 
> > Implementing this varies with your needs. You could use a single
> > timestamp that gets clobbered each time, if you don't need a huge
> > auditing trail. You could also do a list of timestamps if it would prove
> > helpful.
> > 
> > One other technique that I'm especially fond of is to store changes as
> > attachments to each document. This gives you great audit trails -- who
> > made what change when. You could go so far as to store the full document
> > state before the change.
> > 
> > But if you don't need that level of auditing, a timestamp field is the
> > way to go.
> > 
> >> -We want to keep track of the time that a document get replicated to a target
databases on replication.
> > 
> > Don't. Don't don't don't.
> > 
> > But because I hate it when the answer is "you're doing it wrong" and
> > nothing else, some notes:
> > 
> > * You will definitely want to separate the replication time from the
> > update time (as they clearly aren't the same thing.)
> > 
> > * Further, that *cannot* go in the document, clearly.
> > 
> > * You'd need at a minimum filtered/named replication to send the
> > documents you want, and an update handler to put the "replicated time"
> > in some other document.
> > 
> > Again though, you never answered the simple question of *why* you want
> > to know this. Let me be clear: what you are trying to do adds a bunch of
> > complexity to your documents, your replication, and your program. And
> > I'm not sure why you want to do it so badly.
> > 
> > What problem do you think you are solving by storing the replicated time?
> > 
> > -- 
> > Owen Marshall
> > FacilityONE
> > omarshall@facilityone.com | (502) 805-2126
> > 
> 



Mime
View raw message