couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: data loading
Date Wed, 04 Feb 2009 11:25:18 GMT

On 4 Feb 2009, at 09:51, rhettg@gmail.com wrote:

> So i've got it running now at about 30 megs a minute now, which I  
> think is going to work fine.
> Should take about an hour per day of data.
>
> The python process and couchdb process seem to be using about 100%  
> of a single CPU.

That could be the JSON conversion.

> In terms of getting as much data in as fast as I can, how should I  
> go about parallelizing this process ?
> How well does couchdb (and erlang is suppose) make use of multiple  
> CPUs in linux ?
>
> Is it better to:
> 1. Run multiple importers against the same db
> 2. Run multiple importers against different db's and merge  
> (replicate) together on the same box
> 3. Run multiple importers on different db's on different machines  
> and replicate them together ?

All depends on your data and hardware. All writes to a single db get  
serialized. If you have a single
writer that can fill all the bandwidth for your single disk, that's  
all you need. but usually it is not and
adding more writers can help.

Splitting writes over multiple databases only helps if you can  
generate more writes than a single
disk can handle and you have multiple disks. Replication uses bulk  
inserts, so the final migration
step is a bottleneck again. If you need to sustain a higher write  
rate, you need to keep your data
in multiple databases and merge on read.

For simple data import, try 2-N writers into the same DB. Everything  
else is way too complicated :)

Cheers
Jan
--


>
>
> I'm going to experiment with some of these setups (if they're even  
> possible, i'm total newb here) but any
> insight from the experienced would be great.
>
> Thanks,
>
> Rhett
>
> On Feb 4, 2009 12:13am, Rhett Garber <rhettg@gmail.com> wrote:
>> Oh awesome. That's much better. Getting about 15 megs a minute now.
>>
>>
>>
>> Rhett
>>
>>
>>
>> On Wed, Feb 4, 2009 at 12:07 AM, Ulises ulises.cervino@gmail.com>  
>> wrote:
>>
>> >> Loading in the couchdb, i've only got 30 megs in the last hour.  
>> That
>>
>> >> 30 megs has turned into 389 megs in the couchdb data file. That
>>
>> >> doesn't seem like enough disk IO to cause this sort of delay.....
>>
>> >> where is the time going ? network ?
>>
>> >
>>
>> > Are you uploading one document at a time or using bulk updates?  
>> You do
>>
>> > this using update([doc1, doc2,...]) in couchdb-python.
>>
>> >
>>
>> > HTH,
>>
>> >
>>
>> > U
>>
>> >
>>


Mime
View raw message