couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Bryan <jbr...@cashnetusa.com>
Subject Re: Write Performance
Date Thu, 08 Jan 2009 01:47:25 GMT
Thanks for all the replies, I'll upgrade couch and erlang to the latest 
and retest.  Yes, this is a single time import, but 70 millions records 
at 50 - 60 writes a second doesn't mean a day, it means 2 weeks or 
more.  I don't mind throwing extra hardware at the problem, but I just 
want to make sure I'm throwing extra hardware in the right place and 
using existing hardware as best as I can.  If writes to all DBs are 
serialized in a single thread, then if I partition the data into two DBs 
and fire up two copies of couch, I should be able to make use of another 
processor on the same machine?  I'll test this tomorrow along with the 
newer versions.

Thanks,
Josh

Paul Davis wrote:
> Erlang 5.5.5 is borked. 5.6.x should be ok.
>
> Also, yes, writes to the database are serialized in a single thread.
> For reference, when storing data, are you using the _bulk_docs
> interface?
>
> Also, in trunk the fsync calls are turned off by default now so you
> should notice more speedup there.
>
> Also, if these are archived records, wouldn't this be a single time
> cost? Faster is always better, but if it takes a day, is that a big
> deal?
>
> HTH
> Paul
>
> On Wed, Jan 7, 2009 at 2:55 PM, Josh Bryan <jbryan@cashnetusa.com> wrote:
>   
>> Chris Anderson wrote:
>>     
>>> On Wed, Jan 7, 2009 at 4:37 PM, Josh Bryan <jbryan@cashnetusa.com> wrote:
>>>
>>>       
>>>> Hi,
>>>>
>>>> I am looking into CouchDB as a solution to store a bunch (approx 70
>>>> million) archived documents.  While planning for the import process, I
>>>> did some benchmarking to figure out how long the import will take.  I
>>>> get about 50-70 inserts per second on average.  However, when I looked
>>>> for the bottleneck, I couldn't figure it out.  I am connected to the
>>>> database via a fast lan and can verify that the network is not
>>>> saturated.  I can also verify that disk IO is not saturated.  The only
>>>> clue is that of the 4 cpus on the server, it seems that only one is
>>>> getting fully loaded.  Also, of the 5 erlang processes I can see
>>>> running, only one of them seems to be getting most of the cpu time.  I
>>>> know that erlang is built with smp enabled, so if it is cpu bound, why
>>>> can't it make use of the other 3 processors?
>>>>
>>>> I thought that perhaps there was some internal write lock issue per
>>>> database that allowed only one thread to write to a db at a time, so I
>>>> tried running the benchmarks while hitting multiple databases, but still
>>>> got the same write rate across the databases.  Is there some globally
>>>> shared resource in couchdb that limits all writes to a single thread?
>>>>
>>>> Thanks,
>>>> Josh
>>>>
>>>>
>>>>         
>>> Before we can help you diagnose the performance you're seeing, could
>>> you tell us the version of CouchDB and the version of Erlang that you
>>> are using? It wouldn't hurt to describe the hardware in more detail
>>> either.
>>>
>>>
>>>       
>> I am seeing similar results on two systems.
>>
>> System 1:
>> Quad core Intel(R) Xeon(R) CPU 5160  @ 3.00GHz
>> 2 GB ram
>> Linux 2.6.18-4  -- Debian Lenny
>> Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [smp:4]
>> [async-threads:0] [kernel-poll:false]
>> couchdb - Apache CouchDB 0.8.0-incubating
>>
>> System 2:
>> Intel(R) Pentium(R) D CPU 3.00GHz
>> 3 GB ram
>> Erlang (BEAM) emulator version 5.5.5 [source] [async-threads:0]
>> [kernel-poll:false]
>> couchdb - Apache CouchDB 0.9.0a724455-incubating
>>
>> Thanks
>>
>>
>>
>>     

Mime
View raw message