couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Stevens (Gmail)" <wickedg...@gmail.com>
Subject Upload speed for large attachments
Date Wed, 08 Jun 2011 08:36:53 GMT
Running the following code on a macbook pro, using CouchDBX 1.0.2
(everything local), we're seeing the following output when trying to
attach a file with 10MB of random data:

Code: https://gist.github.com/bc0c36f36be0c85e2a36 (code included in full below)
Output:

Using curl: 0.168450117111
Using put_attachment: 0.309157133102
post time: 2.5557808876
Using multipart: 2.61283898354
Encoding base64: 0.0497629642487
Updating: 5.0550069809

Server log: https://gist.github.com/a80a495fd35049ff871f (there's a
HEAD/DELETE/PUT/GET cycle that's just cleanup)

The calls in question are:

Using curl: 0.168450117111
1> [info] [<0.27828.7>] 127.0.0.1 - - 'PUT'
/benchmark_entity/bigfile/bigfile/bigfile.gz?rev=78-db58ded2899c5546e349feb5a8c0eee4
201

Using put_attachment: 0.309157133102
1> [info] [<0.27809.7>] 127.0.0.1 - - 'PUT'
/benchmark_entity/bigfile/smallfile?rev=81-c538b38a8463952f0136143cfa49e9fa
201

Using multipart: 2.61283898354 (post time: 2.5557808876)
1> [info] [<0.27809.7>] 127.0.0.1 - - 'POST' /benchmark_entity/bigfile 201

Updating: 5.0550069809
1> [info] [<0.27809.7>] 127.0.0.1 - - 'POST' /benchmark_entity/_bulk_docs 201

Profiling our code shows 1.5 sec of CPU usage in our code (which
covers setup / cleanup code that's not included in the times above),
and 11.8 sec of total run time, which roughly matches up with the
PUT/POST times above.  Basically, I feel pretty confident that the
bulk of the times above are not in our client code, and are instead
due to couchdb's handling time.

Why is the form/multipart handler so much slower than using a bare PUT
on the attachment?  Why is the base64 approach even slower?  Is it due
to bandwidth issues, couchdb CPU usage...?

Thanks for any help,
Eli

Full code from: https://gist.github.com/bc0c36f36be0c85e2a36

import base64
import contextlib
import cStringIO
import subprocess
import time

import couchdb
import couchdb.json
import couchdb.multipart

@contextlib.contextmanager
def stopwatch(m=''):
    t0=time.time()
    yield
    tdiff=time.time() - t0
    if m:
        print '{}: {}'.format(m, tdiff)
    else:
        print tdiff

def reset(d):
    try:
        del d['bigfile']
    except couchdb.http.ResourceNotFound:
        pass
    d['bigfile'] = {'foo': 'bar'}
    return d['bigfile']

s = couchdb.Server()
d = s['benchmark_entity']

fn = '/tmp/bigfile.gz'
fn = '/tmp/smallfile'

doc = reset(d)
with stopwatch('Using curl'):
    p = subprocess.Popen([
        'curl',
        '-X', 'PUT',
        'http://localhost:5984/benchmark_entity/{}/bigfile/bigfile.gz?rev={}'.format(doc.id,
doc.rev),
        '-d', '@{}'.format(fn),
        '-H', 'Content-Type: application/gzip'
        ])
    p.wait()

doc = reset(d)
with open(fn, 'r') as f:
    with stopwatch('Using put_attachment'):
        d.put_attachment(doc, f)

doc = reset(d)
with open(fn, 'r') as f:
    content_name = 'bigfile.gz'
    content = f.read()
    content_type = 'application/gzip'
    with stopwatch('Using multipart'):
        fileobj = cStringIO.StringIO()

        with couchdb.multipart.MultipartWriter(fileobj, headers=None,
subtype='form-data') as mpw:
            mime_headers = {'Content-Disposition': '''form-data; name="_doc"'''}
            mpw.add('application/json', couchdb.json.encode(doc), mime_headers)

            mime_headers = {'Content-Disposition': '''form-data;
name="_attachments"; filename="{}"'''.format(content_name)}
            mpw.add(content_type, content, mime_headers)

        header_str, blank_str, body = fileobj.getvalue().split('\r\n', 2)

        http_headers = {'Referer': d.resource.url, 'Content-Type':
header_str[len('Content-Type: '):]}
        params = {}
        t0 = time.time()
        status, msg, data = d.resource.post(doc['_id'], body,
http_headers, **params)
        print 'post time: {}'.format(time.time() - t0)

doc = reset(d)
with open(fn, 'r') as f:
    content_name = 'bigfile.gz'
    content = f.read()
    content_type = 'application/gzip'
    with stopwatch('Encoding base64'):
        doc['_attachments'] = {content_name: {'content_type':
content_type, 'data': base64.b64encode(content)}}
    with stopwatch('Updating'):
        d.update([doc])

Mime
View raw message