Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 77728 invoked from network); 4 Feb 2009 08:52:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Feb 2009 08:52:07 -0000 Received: (qmail 65141 invoked by uid 500); 4 Feb 2009 08:52:05 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 65102 invoked by uid 500); 4 Feb 2009 08:52:05 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 65091 invoked by uid 99); 4 Feb 2009 08:52:05 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Feb 2009 00:52:05 -0800 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of rhettg@gmail.com designates 209.85.198.209 as permitted sender) Received: from [209.85.198.209] (HELO rv-out-0304.google.com) (209.85.198.209) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Feb 2009 08:51:57 +0000 Received: by rv-out-0304.google.com with SMTP id b20so801110rvf.23 for ; Wed, 04 Feb 2009 00:51:36 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <41b0fe890902040013t3ed7844dn5e7485496626325c@mail.gmail.com> Received: by 10.114.190.6 with SMTP id n6mr997051waf.19.1233737496137; Wed, 04 Feb 2009 00:51:36 -0800 (PST) Message-ID: <00163646d64e341714046213e58e@google.com> Date: Wed, 04 Feb 2009 08:51:36 +0000 Subject: Re: Re: data loading From: rhettg@gmail.com To: user@couchdb.apache.org Content-Type: multipart/alternative; boundary=00163646d64e3416b8046213e538 X-Virus-Checked: Checked by ClamAV on apache.org --00163646d64e3416b8046213e538 Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit So i've got it running now at about 30 megs a minute now, which I think is going to work fine. Should take about an hour per day of data. The python process and couchdb process seem to be using about 100% of a single CPU. In terms of getting as much data in as fast as I can, how should I go about parallelizing this process ? How well does couchdb (and erlang is suppose) make use of multiple CPUs in linux ? Is it better to: 1. Run multiple importers against the same db 2. Run multiple importers against different db's and merge (replicate) together on the same box 3. Run multiple importers on different db's on different machines and replicate them together ? I'm going to experiment with some of these setups (if they're even possible, i'm total newb here) but any insight from the experienced would be great. Thanks, Rhett On Feb 4, 2009 12:13am, Rhett Garber wrote: > Oh awesome. That's much better. Getting about 15 megs a minute now. > > > > Rhett > > > > On Wed, Feb 4, 2009 at 12:07 AM, Ulises ulises.cervino@gmail.com> wrote: > > >> Loading in the couchdb, i've only got 30 megs in the last hour. That > > >> 30 megs has turned into 389 megs in the couchdb data file. That > > >> doesn't seem like enough disk IO to cause this sort of delay..... > > >> where is the time going ? network ? > > > > > > Are you uploading one document at a time or using bulk updates? You do > > > this using update([doc1, doc2,...]) in couchdb-python. > > > > > > HTH, > > > > > > U > > > > --00163646d64e3416b8046213e538--