incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Stevens (Gmail)" <>
Subject Upload speed for large attachments
Date Wed, 08 Jun 2011 08:36:53 GMT
Running the following code on a macbook pro, using CouchDBX 1.0.2
(everything local), we're seeing the following output when trying to
attach a file with 10MB of random data:

Code: (code included in full below)

Using curl: 0.168450117111
Using put_attachment: 0.309157133102
post time: 2.5557808876
Using multipart: 2.61283898354
Encoding base64: 0.0497629642487
Updating: 5.0550069809

Server log: (there's a
HEAD/DELETE/PUT/GET cycle that's just cleanup)

The calls in question are:

Using curl: 0.168450117111
1> [info] [<0.27828.7>] - - 'PUT'

Using put_attachment: 0.309157133102
1> [info] [<0.27809.7>] - - 'PUT'

Using multipart: 2.61283898354 (post time: 2.5557808876)
1> [info] [<0.27809.7>] - - 'POST' /benchmark_entity/bigfile 201

Updating: 5.0550069809
1> [info] [<0.27809.7>] - - 'POST' /benchmark_entity/_bulk_docs 201

Profiling our code shows 1.5 sec of CPU usage in our code (which
covers setup / cleanup code that's not included in the times above),
and 11.8 sec of total run time, which roughly matches up with the
PUT/POST times above.  Basically, I feel pretty confident that the
bulk of the times above are not in our client code, and are instead
due to couchdb's handling time.

Why is the form/multipart handler so much slower than using a bare PUT
on the attachment?  Why is the base64 approach even slower?  Is it due
to bandwidth issues, couchdb CPU usage...?

Thanks for any help,

Full code from:

import base64
import contextlib
import cStringIO
import subprocess
import time

import couchdb
import couchdb.json
import couchdb.multipart

def stopwatch(m=''):
    tdiff=time.time() - t0
    if m:
        print '{}: {}'.format(m, tdiff)
        print tdiff

def reset(d):
        del d['bigfile']
    except couchdb.http.ResourceNotFound:
    d['bigfile'] = {'foo': 'bar'}
    return d['bigfile']

s = couchdb.Server()
d = s['benchmark_entity']

fn = '/tmp/bigfile.gz'
fn = '/tmp/smallfile'

doc = reset(d)
with stopwatch('Using curl'):
    p = subprocess.Popen([
        '-X', 'PUT',
        '-d', '@{}'.format(fn),
        '-H', 'Content-Type: application/gzip'

doc = reset(d)
with open(fn, 'r') as f:
    with stopwatch('Using put_attachment'):
        d.put_attachment(doc, f)

doc = reset(d)
with open(fn, 'r') as f:
    content_name = 'bigfile.gz'
    content =
    content_type = 'application/gzip'
    with stopwatch('Using multipart'):
        fileobj = cStringIO.StringIO()

        with couchdb.multipart.MultipartWriter(fileobj, headers=None,
subtype='form-data') as mpw:
            mime_headers = {'Content-Disposition': '''form-data; name="_doc"'''}
            mpw.add('application/json', couchdb.json.encode(doc), mime_headers)

            mime_headers = {'Content-Disposition': '''form-data;
name="_attachments"; filename="{}"'''.format(content_name)}
            mpw.add(content_type, content, mime_headers)

        header_str, blank_str, body = fileobj.getvalue().split('\r\n', 2)

        http_headers = {'Referer': d.resource.url, 'Content-Type':
header_str[len('Content-Type: '):]}
        params = {}
        t0 = time.time()
        status, msg, data =['_id'], body,
http_headers, **params)
        print 'post time: {}'.format(time.time() - t0)

doc = reset(d)
with open(fn, 'r') as f:
    content_name = 'bigfile.gz'
    content =
    content_type = 'application/gzip'
    with stopwatch('Encoding base64'):
        doc['_attachments'] = {content_name: {'content_type':
content_type, 'data': base64.b64encode(content)}}
    with stopwatch('Updating'):

View raw message