couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: Bulk Load
Date Wed, 17 Sep 2008 15:11:41 GMT
Hi Ronny,

sorry, late reply.

Once way to re-introduce optimistic locking is saving the new revision
over the latest one and then copying the previous revision of the doc
into a new doc. You can't run compaction in between, but since you
control it, JUST DON'T CALL IT ;-).

Cheers
Jan
--
On Sep 14, 2008, at 23:49, Ronny Hanssen wrote:

> Thanks for your reply, Jan.
>
> I do remember the discussion in the mailinglist, but at the time I  
> didn't
> understand the argumentation. Maybe because I really didn't have  
> time to
> dive into the matter back then. But, it seriously has puzzled me  
> since. Then
> this post appears and I jump at the chance to get this cleared out  
> (sorry
> for being slow - which makes me the opposite of arrogant I guess :D).
>
> But, I don't have a solution. I guess you are right in that sense. I  
> just
> fail to see that making new docs are making life easier? I believe  
> it makes
> the single node case worse and probably equally difficult (or worse)  
> for the
> distributed multiple node architecture. Reading from what you say,  
> there is
> "evil" lurking in the replication process no matter which way we  
> handle
> this. I mean, for multiple nodes the replication would probably be  
> slower
> than the return to the users changing the same doc on two different  
> nodes to
> be informed. This would result in multiple versions of the same doc  
> being
> around, at least until replication - when couchdb would find out  
> that two
> competing versions exist. I might be wrong about this, but the users  
> can't
> be left waiting for an "ok-saved" reply from couchdb "forever",  
> right? So,
> couchdb would have to decide which version "wins" during  
> replication, right?
>
>
> Considering the effects you are hinting about, I'd personally want a  
> single
> node couchdb for writes, with extra nodes for reading and serving  
> views...
> Maybe additional write-nodes for different doc-types (one write-node  
> pr
> doc-type)... Just to "ensure" that there cannot be two+ docs updated  
> at two+
> nodes simultaneously. That is, in the beginning I'd really rather go  
> for a
> single node, with a replicated backup/failover. As (if) system stress
> increase I'd opt for splitting write and reads on nodes and/or  
> creating
> write-nodes designated for different doc-types. This is still not  
> perfect,
> but distributed never will be, really.
>
> Unless... If the couchdb data was stored in a distributed file- 
> system (NAS
> or SAN), each copy of the couchdb process would be operating on the  
> same
> disk. This doesn't mean more data-reliability and also imposes  
> delays in
> reads and writes. But, it would mean that couchdb would be scalable
> (multiple (vurtual" nodes work on same physical disk). Other  
> "physical"
> nodes could be created that would replicate as couchdb is set up to do
> already. So, allowing "virtual" nodes could work out as a nice  
> addition I
> think.
>
> But, then again, my knowledge in distributed file-systems (NAS or  
> SAN) are
> really limited... And, I might have missed out on alot more than  
> that - so
> all this might of course just be stupid :)
>
> Thank's for reading.
>
> ~Ronny
>
> 2008/9/14 Jan Lehnardt <jan@apache.org>
>
>> Hi Ronny,
>> On Sep 14, 2008, at 11:45, Ronny Hanssen wrote:
>>
>>> Or have I seriously missed out on some vital information?   
>>> Because, based
>>> on
>>> the above I still feel very confused about why we cannot use the  
>>> built-in
>>> rev-control mechanism.
>>>
>>
>> You correctly identify that adding revision control to a single node
>> instance of
>> CouchDB is not that hard (a quick search through the archives would  
>> have
>> told
>> you, too :-) Making all that work in a distributed environment with
>> replication conflict
>> detection and all is mighty hard. If you can come up with a nice an  
>> clean
>> solution to
>> make proper revision control work with CouchDB's replication  
>> including all
>> the weird
>> edge cases I don't even know about (aren't I arrogant this  
>> morning? :), we
>> are happy
>> to hear about it.
>>
>> Cheers
>> Jan
>> --
>>
>>
>>
>>
>>
>>>
>>> ~Ronny
>>>
>>> 2008/9/14 Jeremy Wall <jwall@google.com>
>>>
>>> Two reasons.
>>>> * First as I understand it the revisions are not changes between
>>>> documents.
>>>> They are actual full copies of the document.
>>>> * Second revisions get blown away when doing a database compact.
>>>> Something
>>>> you will more than likely want to do since it eats up database  
>>>> space
>>>> fairly
>>>> quickly. (see above for the reason why)
>>>>
>>>> That said there is nothing preventing you from storing revisions in
>>>> CouchDB.
>>>> You could store a changeset for each document revision is a  
>>>> seperate
>>>> revision document that accompanies your main document. It would  
>>>> be really
>>>> easy and designing views to take advantage of them to show a  
>>>> revision
>>>> history for you document would be really easy.
>>>>
>>>> I suppose you could use the revisions that CouchDB stores but that
>>>> wouldn't
>>>> be very efficient since each one is a complete copy of the  
>>>> document. And
>>>> you
>>>> couldn't depend on that "feature not changing behaviour on you in  
>>>> later
>>>> versions since it's not intended for revision history as a feature.
>>>>
>>>> On Sat, Sep 13, 2008 at 7:24 PM, Ronny Hanssen <super.ronny@gmail.com
>>>>
>>>>> wrote:
>>>>>
>>>>
>>>> Why is the revision control system in couchdb inadequate for, well,
>>>>> revision
>>>>> control? I thought that this feature indeed was a feature, not  
>>>>> just an
>>>>> internal mechanism for resolving conflicts?
>>>>> Ronny
>>>>>
>>>>> 2008/9/14 Calum Miller <calum_miller@yahoo.com>
>>>>>
>>>>> Hi Chris,
>>>>>>
>>>>>> Many thanks for your prompt response.
>>>>>>
>>>>>> Storing  a complete new version of each bond/instrument every  
>>>>>> day seems
>>>>>>
>>>>> a
>>>>
>>>>> tad excessive. You can imagine how fast the database will grow  
>>>>> overtime
>>>>>>
>>>>> if a
>>>>>
>>>>>> unique version of each instrument must be saved, rather than  
>>>>>> just the
>>>>>> individual changes. This must be a common pattern, not confined 

>>>>>> to
>>>>>> investment banking. Any ideas how this pattern can be  
>>>>>> accommodated
>>>>>>
>>>>> within
>>>>
>>>>> CouchDB?
>>>>>>
>>>>>> Calum Miller
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Chris Anderson wrote:
>>>>>>
>>>>>> Calum,
>>>>>>>
>>>>>>> CouchDB should be easily able to handle this load.
>>>>>>>
>>>>>>> Please note that the built-in revision system is not designed
 
>>>>>>> for
>>>>>>> document history. Its sole purpose is to manage conflicting 

>>>>>>> documents
>>>>>>> that result from edits done in separate copies of the DB,  
>>>>>>> which are
>>>>>>> subsequently replicated into a single DB.
>>>>>>>
>>>>>>> If you allow CouchDB to create a new document for each daily
 
>>>>>>> import of
>>>>>>> each security, and create a view which makes these documents
 
>>>>>>> available
>>>>>>> by security and date, you should be able to access securities
 
>>>>>>> history
>>>>>>> fairly simply.
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>> On Sat, Sep 13, 2008 at 12:31 PM, Calum Miller <
>>>>>>>
>>>>>> calum_miller@yahoo.com>
>>>>
>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I trying to evaluate CouchDB for use within investment  
>>>>>>>> banking, yes
>>>>>>>>
>>>>>>> some
>>>>>
>>>>>> of
>>>>>>>> these banks still exist. I want to load 500,000 bonds into
the
>>>>>>>>
>>>>>>> database
>>>>
>>>>> with
>>>>>>>> each bond containing around 100 fields. I would be looking
to  
>>>>>>>> bulk
>>>>>>>>
>>>>>>> load
>>>>
>>>>> a
>>>>>
>>>>>> similar amount of these bonds every day whilst maintaining a  
>>>>>> history
>>>>>>>>
>>>>>>> via
>>>>>
>>>>>> the
>>>>>>>> revision feature. Are there any bulk load features available
 
>>>>>>>> for
>>>>>>>>
>>>>>>> CouchDB
>>>>>
>>>>>> and
>>>>>>>> any tips on how to manage regular loads of this volume?
>>>>>>>>
>>>>>>>> Many thanks in advance and best of luck with this project.
>>>>>>>>
>>>>>>>> Calum Miller
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>


Mime
View raw message