couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <jch...@apache.org>
Subject Re: Insert performance
Date Mon, 04 May 2009 18:23:37 GMT
On Mon, May 4, 2009 at 10:20 AM, Tom Nichols <tmnichols@gmail.com> wrote:
> well, if I set "batch" to true, I all of my load scripts die after a
> short amount of time with this error:
>
> /var/lib/gems/1.8/gems/couchrest-0.24/lib/couchrest/monkeypatches.rb:41:in
> `rbuf_fill': uninitialized constant Timeout::TimeoutError (NameError)
>        from /usr/lib/ruby/1.8/net/protocol.rb:116:in `readuntil'
>        from /usr/lib/ruby/1.8/net/protocol.rb:126:in `readline'
>        from /usr/lib/ruby/1.8/net/http.rb:2020:in `read_status_line'
>
> Regardless, it still seems like there is a bottleneck on the server
> end.  Did I mention I'm running the 'load' scripts locally?  So it's
> not network latency that is causing the slowness.  Any other ideas?
>

You're probably best of using explicit bulk_docs saves with an array
of documents. That way you know how much you are passing to CouchDB at
a time. With smallish docs (less than a few kb) you can usually do
around 1000 at a time to get the best insert performance.

> Thanks.
> -Tom
>
>
> On Mon, May 4, 2009 at 12:19 PM, Zachary Zolton
> <zachary.zolton@gmail.com> wrote:
>> Yeah, the optional second argument —for usign bulk save semantics—
>> defaults to false.
>>
>> Also, there's an option where you can set how many documents to batch
>> save at a time. I don't remember the default, but I've had good luck
>> saving with anywhere between 500 and 2000 docs.
>>
>> On Mon, May 4, 2009 at 11:13 AM, Tom Nichols <tmnichols@gmail.com> wrote:
>>> Thanks.  I'm using save_doc, I just need to pass 'true' as a second argument?
>>>
>>> I posted the question here because I assumed the performance
>>> bottleneck was on the CouchDB end, not my ruby script.  Am I wrong? I
>>> assumed if I was running 20 "slow" ruby scripts they would peg the
>>> CPU.  The fact that I'm not seeing that makes me think there is some
>>> blocking/ synchronization that is making the CouchDB server slow....?
>>>
>>> Thanks again.
>>> -Tom
>>>
>>> On Mon, May 4, 2009 at 11:58 AM, Zachary Zolton
>>> <zachary.zolton@gmail.com> wrote:
>>>> Short answer: use db.save_doc(hash, true) for bulk_docs behavior.
>>>>
>>>> Also, consider moving this thread to the CouchRest Google Group:
>>>> http://groups.google.com/group/couchrest/topics
>>>>
>>>> Cheers,
>>>> zdzolton
>>>>
>>>> On Mon, May 4, 2009 at 10:40 AM, Tom Nichols <tmnichols@gmail.com>
wrote:
>>>>> Hi, I have some questions about insert performance.
>>>>>
>>>>> I have a single CouchDB 0.9.0 node running on small EC2 instance.  I
>>>>> attached a huge EBS volume to it and mounted it where CouchDB's data
>>>>> files are stored.  I fired up about ruby scripts running inserts and
>>>>> after a weekend I only have about 30GB/ 12M rows of data...  Which
>>>>> seems small.  'top' tells me that my CPU is only about 30% utilized.
>>>>>
>>>>> Any idea what I might be doing wrong?  I pretty much just followed
>>>>> these instructions:
>>>>> http://wiki.apache.org/couchdb/Getting_started_with_Amazon_EC2
>>>>>
>>>>> My ruby script looks like this:
>>>>> #!/usr/bin/env ruby
>>>>> #Script to load random data into CouchDB
>>>>>
>>>>> require 'rubygems'
>>>>> require 'couchrest'
>>>>>
>>>>> db = CouchRest.database! "http://127.0.0.1:5984/#{ARGV[0]}"
>>>>> puts "Created database: #{ARGV[0]}"
>>>>>
>>>>> max = 9999999999999999
>>>>> while 1
>>>>>        puts 'loading...'
>>>>>        for val in 0..max
>>>>>                db.save_doc({ :key => val, 'val one' =>
"val ${val}",
>>>>> 'val2' => "#{ARGV[1]} #{val}" })
>>>>>        end
>>>>> end
>>>>>
>>>>>
>>>>> Thanks in advance...
>>>>>
>>>>
>>>
>>
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Mime
View raw message