couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ronny Hanssen" <super.ro...@gmail.com>
Subject Re: Bulk Load
Date Sun, 14 Sep 2008 09:45:00 GMT
I hear. But:

   - Compacting isn't an automatic cleaning process. And, as I have figured
   from earlier, it is likely that it won't be either(?). So, if the developer
   could control compacting then compacting isn't a problem.
   - CouchDB not storing deltas for it's revision is not an argument either.
   Saving the full record as a new document isn't storing deltas either and
   won't save space. So, both are using the same amount of space, but, when
   storing a new document the dev needs to custom-make the revision control
   system. I agree that saving deltas would be "nice", but then it would not
   have been good for conflict-handling. I'd also vote for a revision-control
   that has complete copies for simplicity. It is so easy to rollback to any
   given revision when all fields are intact in the older revisions too.
   Granted, sometimes you may need delta versions. In those cases I would
   implement the rev-control myself indeed.
   - And, saving the same doc with an extra field for id certainly cannot
   change anything. If that was all that was missing for CouchDB to handle
   rev-control it should be easy to add to CouchDB as a full-blown feature. As
   I have understood it, adding a field will not change anything. The
   recommended way is still to make a new document.
   - I know it is not "intended for revision control", or rather, I know
   now. When I started to follow CouchDB with interest this was one of the
   features I really loved. And from reading the buzz and the project pages I
   certainly got the impression that tihs was a feature.


Given that compacting is supposed to be dev-controlled I can't see the
problem? It would actually be "perfect" the way it was stated earlier that
the dev would be able to write a compacting-handler that decided which docs
and revs to purge. For instance:

if (docTypesToCompact.contains(doc.type)) purge();

Storing in a new record will also have same usage pattern as internal
rev-control. And, adding an extra field will not change anything, because
CouchDB already knows the order of the revisions. An extra field surely
cannot change anything.

And, another thing. If, we should make new docs every time we need revision
control, then we get to the scenario that two users change the same document
almost simultaneously. When both users save to a new document, then the
built-in conflict management in CouchDB is suddenly void, and will not catch
it. Meaning, we have to make logic to handle this scenario ourselves, extra
checks or whatever needed. The point was for CouchDB to catch update race
conditions. Deletes and creates will be the same, but updates will be
affected, as well as future reads, when there are suddenly two copies of the
old document around. Since CouchDB uses optimistic locking this can happen.
It can be detected, yes, but it will need some extra code to handle this
(ie. require to flag the old doc to be "old", then the last process updating
would be informed that there has been action since last change. But, during
this time, we could have a case of double docs, or a moment of no doc,
depending on when to update the "old" flag. If the rev-control itself was
used the conflict would be detected the way it was designed to be detected
in CouchDB. There would then always be only one actual version of the
document in the database.

Or have I seriously missed out on some vital information?  Because, based on
the above I still feel very confused about why we cannot use the built-in
rev-control mechanism.

~Ronny

2008/9/14 Jeremy Wall <jwall@google.com>

> Two reasons.
>  * First as I understand it the revisions are not changes between
> documents.
> They are actual full copies of the document.
>  * Second revisions get blown away when doing a database compact. Something
> you will more than likely want to do since it eats up database space fairly
> quickly. (see above for the reason why)
>
> That said there is nothing preventing you from storing revisions in
> CouchDB.
> You could store a changeset for each document revision is a seperate
> revision document that accompanies your main document. It would be really
> easy and designing views to take advantage of them to show a revision
> history for you document would be really easy.
>
> I suppose you could use the revisions that CouchDB stores but that wouldn't
> be very efficient since each one is a complete copy of the document. And
> you
> couldn't depend on that "feature not changing behaviour on you in later
> versions since it's not intended for revision history as a feature.
>
> On Sat, Sep 13, 2008 at 7:24 PM, Ronny Hanssen <super.ronny@gmail.com
> >wrote:
>
> > Why is the revision control system in couchdb inadequate for, well,
> > revision
> > control? I thought that this feature indeed was a feature, not just an
> > internal mechanism for resolving conflicts?
> > Ronny
> >
> > 2008/9/14 Calum Miller <calum_miller@yahoo.com>
> >
> > > Hi Chris,
> > >
> > > Many thanks for your prompt response.
> > >
> > > Storing  a complete new version of each bond/instrument every day seems
> a
> > > tad excessive. You can imagine how fast the database will grow overtime
> > if a
> > > unique version of each instrument must be saved, rather than just the
> > > individual changes. This must be a common pattern, not confined to
> > > investment banking. Any ideas how this pattern can be accommodated
> within
> > > CouchDB?
> > >
> > > Calum Miller
> > >
> > >
> > >
> > >
> > >
> > > Chris Anderson wrote:
> > >
> > >> Calum,
> > >>
> > >> CouchDB should be easily able to handle this load.
> > >>
> > >> Please note that the built-in revision system is not designed for
> > >> document history. Its sole purpose is to manage conflicting documents
> > >> that result from edits done in separate copies of the DB, which are
> > >> subsequently replicated into a single DB.
> > >>
> > >> If you allow CouchDB to create a new document for each daily import of
> > >> each security, and create a view which makes these documents available
> > >> by security and date, you should be able to access securities history
> > >> fairly simply.
> > >>
> > >> Chris
> > >>
> > >> On Sat, Sep 13, 2008 at 12:31 PM, Calum Miller <
> calum_miller@yahoo.com>
> > >> wrote:
> > >>
> > >>
> > >>> Hi,
> > >>>
> > >>> I trying to evaluate CouchDB for use within investment banking, yes
> > some
> > >>> of
> > >>> these banks still exist. I want to load 500,000 bonds into the
> database
> > >>> with
> > >>> each bond containing around 100 fields. I would be looking to bulk
> load
> > a
> > >>> similar amount of these bonds every day whilst maintaining a history
> > via
> > >>> the
> > >>> revision feature. Are there any bulk load features available for
> > CouchDB
> > >>> and
> > >>> any tips on how to manage regular loads of this volume?
> > >>>
> > >>> Many thanks in advance and best of luck with this project.
> > >>>
> > >>> Calum Miller
> > >>>
> > >>>
> > >>>
> > >>>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message