couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Re: data loading
Date Wed, 04 Feb 2009 14:09:24 GMT
On Wed, Feb 4, 2009 at 3:51 AM,  <rhettg@gmail.com> wrote:
> So i've got it running now at about 30 megs a minute now, which I think is
> going to work fine.
> Should take about an hour per day of data.
>
> The python process and couchdb process seem to be using about 100% of a
> single CPU.
>
> In terms of getting as much data in as fast as I can, how should I go about
> parallelizing this process ?
> How well does couchdb (and erlang is suppose) make use of multiple CPUs in
> linux ?
>
> Is it better to:
> 1. Run multiple importers against the same db
> 2. Run multiple importers against different db's and merge (replicate)
> together on the same box
> 3. Run multiple importers on different db's on different machines and
> replicate them together ?
>

First off, check the version of erlang you're using. If you happened
to install with `sudo apt-get install erlang` chances are you got
5.5.5 which is dog slow due to a VM bug.

Second, the quickest method to get data in to CouchDB is via
_bulk_docs. You're going to basically trade RAM for speed at this
point. The bigger you can make these inserts the better in terms of
everything. I've done single updates with 1M (smallish) docs before.

Third, if you have a good method for generating sorted document id's,
inserting sorted ID's into CouchDB *should* give you better write
performance. Chris Anderson had some luck with this from directly
within the Erlang VM. There's no reason it shouldn't apply to the HTTP
api as well but I haven't personally tested it just to make sure.

HTH,
Paul Davis

> I'm going to experiment with some of these setups (if they're even possible,
> i'm total newb here) but any
> insight from the experienced would be great.
>
> Thanks,
>
> Rhett
>
> On Feb 4, 2009 12:13am, Rhett Garber <rhettg@gmail.com> wrote:
>>
>> Oh awesome. That's much better. Getting about 15 megs a minute now.
>>
>>
>>
>> Rhett
>>
>>
>>
>> On Wed, Feb 4, 2009 at 12:07 AM, Ulises ulises.cervino@gmail.com> wrote:
>>
>> >> Loading in the couchdb, i've only got 30 megs in the last hour. That
>>
>> >> 30 megs has turned into 389 megs in the couchdb data file. That
>>
>> >> doesn't seem like enough disk IO to cause this sort of delay.....
>>
>> >> where is the time going ? network ?
>>
>> >
>>
>> > Are you uploading one document at a time or using bulk updates? You do
>>
>> > this using update([doc1, doc2,...]) in couchdb-python.
>>
>> >
>>
>> > HTH,
>>
>> >
>>
>> > U
>>
>> >
>>
>

Mime
View raw message