incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rhe...@gmail.com
Subject Re: Re: data loading
Date Wed, 04 Feb 2009 08:51:36 GMT
So i've got it running now at about 30 megs a minute now, which I think is  
going to work fine.
Should take about an hour per day of data.

The python process and couchdb process seem to be using about 100% of a  
single CPU.

In terms of getting as much data in as fast as I can, how should I go about  
parallelizing this process ?
How well does couchdb (and erlang is suppose) make use of multiple CPUs in  
linux ?

Is it better to:
1. Run multiple importers against the same db
2. Run multiple importers against different db's and merge (replicate)  
together on the same box
3. Run multiple importers on different db's on different machines and  
replicate them together ?

I'm going to experiment with some of these setups (if they're even  
possible, i'm total newb here) but any
insight from the experienced would be great.

Thanks,

Rhett

On Feb 4, 2009 12:13am, Rhett Garber <rhettg@gmail.com> wrote:
> Oh awesome. That's much better. Getting about 15 megs a minute now.
>
>
>
> Rhett
>
>
>
> On Wed, Feb 4, 2009 at 12:07 AM, Ulises ulises.cervino@gmail.com> wrote:
>
> >> Loading in the couchdb, i've only got 30 megs in the last hour. That
>
> >> 30 megs has turned into 389 megs in the couchdb data file. That
>
> >> doesn't seem like enough disk IO to cause this sort of delay.....
>
> >> where is the time going ? network ?
>
> >
>
> > Are you uploading one document at a time or using bulk updates? You do
>
> > this using update([doc1, doc2,...]) in couchdb-python.
>
> >
>
> > HTH,
>
> >
>
> > U
>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message