incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Coombes <kevin.r.coom...@gmail.com>
Subject Initial Bulk Upload (was Re: Exist test?)
Date Tue, 06 Nov 2012 11:13:53 GMT
Hi Dave,

Special thanks for your suggestion on initial bulk upload.  Point [2] 
explains why I always had to compact immediately afterwards, and reduced 
disk space usage ten-fold....

(And the subject change is so that I and others can maybe find this 
advice again in the future.)

     Kevin

On 11/6/2012 2:15 AM, Dave Cottlehuber wrote:
> On 5 November 2012 19:22, Kevin Burton <rkevinburton@charter.net> wrote:
>> [SNIP]
>>
> Hi Kevin,
>
> [SNIP]
> If you're initially bulk uploading data, I would do 3 things
> differently to what you're currently doing.
>
> 1. assign UUIDs myself
> This is the only enforced unique indexed attribute in a DB, so use it
> well. Put something you want in it. It's basically free text ** within
> reason.
>
> 2. insert them in sorted UUID order
> CouchDB is a database and sorting matters. Couch uses a B~tree ** and
> so if you insert randomly you spend a lot of time forcing the re-write
> of intermediate nodes for no gain. As Couch is an append-only
> datastore this means several things -
> - wasted space until you compact
> - slower insert performance as you have multiple writes instead of one
> http://horicky.blogspot.co.at/2008/10/couchdb-implementation.html
>
> 3. try inserting the first few docs by hand with curl. And read up on
> the _bulk_docs API, this is much much faster.
>
> Re your drivers, there are several but I personally don't use any of
> them. There are more popular ones (based on my dodgy recollection)
> here http://wiki.apache.org/couchdb/Related_Projects hopefully some of
> the other Windows folk will pipe up.
>
> A+
> Dave
>
> ** handwavey


Mime
View raw message