couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Gerard DeRose <jder...@novacut.com>
Subject Re: Implementing a Protocol Buffers API for CouchDB
Date Tue, 27 May 2014 06:44:41 GMT
Hi,

On Sat, May 17, 2014 at 8:31 AM, Sina Samavati <sina.samv@gmail.com> wrote:
>  Hello,
>
> I'm thinking about an alternative API for accessing CouchDB when it's not
> actually facing the web (as a data store for your application). Using couchdb in
> backend has following drawbacks:
>
> * HTTP overhead.
> * Sending JSON over HTTP adds more overhead to the system.
> * Authorization per request has its own overhead.

I'm certainly a fan of any/all things that improve CouchDB performance
and/or reduce its memory usage :)

But personally, I think your ideas might overlook some easier
optimization routes that could likewise benefit clients that use the
existing CouchDB JSON REST API (feel free to prove me wrong, of
course).

The HTTP overhead is likely lower than you might think it is.  Plus, a
strait-forward way to minimize the HTTP overhead is to simply to send
fewer request/response headers along with each request/response.
After reading/writing the request/response preamble, the HTTP
"overhead" is no more than the inherit overhead in the underlying
socket transport.  Sure, it all depends on the ratio of the
request/response preamble size to the request/response body size...
but my unscientific *hunch* is that in "typical" CouchDB usage, HTTP
really isn't biggest the enemy of performance here... that there are
probably more fruitful avenues for performance optimization.

However, I do think that especially when it comes to
localhost-to-localhost communication, eliminating the per request
authorization overhead (er, technically "authentication" overhead,
crufty HTTP header names be damned) is a very interesting and likely
fruitful optimization route.  Personally, I think doing HTTP over
AF_UNIX and eliminating the per-request "Authorization" header
altogether might be very promising.

> What I am proposing is to write an additional binary API (I'm thinking about
> something based on Protocol Buffers) for accessing Couchdb on
> local host/network. I think regardless of choice of binary protocol what's
> important is keeping TCP connection alive and sending/receiving as little as
> possible amount of data through the connection.

High-performance JSON implementations tend to compare quite favourably
with Protocol Buffers.  I have no doubt that Protocol Buffers could
offer somewhat higher performance, but (AFAIK) these days the JSON
encoding/decoding is unlikely the hot spot when it comes to request
handling, at least not since the native JSON encoding/decoding was
added in CouchDB 1.2

There is also a big drawback in that you're effectively proposing a
2nd, incompatible API (at least it seems that way to me).  It's
fantastically useful for CouchDB clients to be oblivious as to whether
they're talking to a local or a remote CouchDB instance, an
abstraction that your proposal seems somewhat at odds with (also, this
isn't a feature that's unique to CouchDB, even though I think it's
probably most elegant in CouchDB... so not something to throw away
lightly, as the competition wont ever be throwing away the same).

My understanding is that Protocol Buffers are very much designed for
statically typed objects coming across the wire, so I have a hard time
seeing how to use them effectively with a schema-less database like
CouchDB (please correctly me if I'm wrong).

So my "devil's advocate" advice for (possibly) more fruitful
optimization routes is:

1) Look for ways to send both fewer request and response headers,
possibly with special optimizations for when a local client is
connecting to a local CouchDB

2) Look for hot-spots in the current request handling and storage
layer, aside from JSON encoding/decoding and the HTTP overhead (well,
how HTTP requests are parsed/handled might still worth a look, I just
wouldn't jump directly to blaming the HTTP protocol itself)

3) Possibly consider AF_UNIX as an optimization route when a local
client is talking to a local CouchDB instance

But most of all... have fun with CouchDB! Would love to hear what you
come up with :)

> I would be glad to hear your feedback and suggestions.
>
> Regards,
> Sina Samavati
>
> --
> Sina Samavati
> Software engineer
>
> https://github.com/s1n4
> https://twitter.com/sinasamavati

Mime
View raw message