incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Bryan <jbr...@cashnetusa.com>
Subject Re: Write Performance
Date Sat, 10 Jan 2009 00:40:07 GMT
I am not able to provide any real data, but attached is a script that
imports random documents of X's with approx the distribution of sizes I
have in my real data.  This is more or less the process that will be
used to import the real data, and demonstrates the bottle neck.  I did
parallelize the writes to a single database. 

Josh --

#!/usr/bin/ruby
require 'rubygems'
require 'couchrest'
require 'digest/md5'
require 'time'
require 'base64'

$NUM_PROCS   = (ARGV.shift || 10).to_i
$NUM_DB      = (ARGV.shift || 2).to_i
$NUM_RECS    = (ARGV.shift || 100000).to_i
$NUM_BULK    = (ARGV.shift || 50).to_i
$HOST        =  "111.111.111.111"
$DB_BASENAME = "test"

$URLS = [
    "http://#{$HOST}:5985/#{$DB_BASENAME}_0",
    "http://#{$HOST}:5985/#{$DB_BASENAME}_1",
    "http://#{$HOST}:5986/#{$DB_BASENAME}_0",
    "http://#{$HOST}:5986/#{$DB_BASENAME}_1",
]

records_per_process = $NUM_RECS / $NUM_PROCS


#create several datafiles of various sizes.
$DATA = [
    "X" * 1000,
    "X" * 2000,
  "X" * 4500,
    "X" * 5000,
    "X" * 7000,
    "X" * 10000
].map { |d| "||#{d}||" }.map{ |data| Base64.encode64(data).gsub(/\s/,'') }




$NUM_PROCS.times do |p_num|
  fork do
    db_num = p_num % $URLS.size
        db = CouchRest.database!($URLS[db_num])

        docs = []
        records_per_process.times do |i|
            doc = {
                '_attachments' => {
                    "text.txt" => {
                        'data' => $DATA[rand($DATA.size)],
                    },
                },
                #:sum => Digest::MD5.hexdigest(data)
            }

            docs << doc
            if docs.size >= $NUM_BULK
                db.bulk_save docs
                docs = []
            end
        end
  end
end

Process.waitall


Damien Katz wrote:
> Did you parallelize writes to a single database? Attachments are
> written in parallel, which should help you in this instance.
>
> -Damien
>
>
> On Jan 9, 2009, at 7:20 PM, Josh Bryan wrote:
>
>> Yes.
>>
>> Damien Katz wrote:
>>> Are you using bulk updates?
>>>
>>> -Damien
>>>
>>> On Jan 9, 2009, at 7:12 PM, Josh Bryan wrote:
>>>
>>>>
>>>> On a dual core pentium 3.0ghz with erlang 5.6 and couch 0.8.0 *using
>>>> bulk*
>>>> writes, I get throughput of 95 writes / second .
>>
>


Mime
View raw message