couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Shorin (JIRA)" <>
Subject [jira] [Commented] (COUCHDB-1743) Make the view server & protocol faster
Date Mon, 29 Apr 2013 14:54:16 GMT


Alexander Shorin commented on COUCHDB-1743:

As an author of Python query server[1], currently I see next problems:

0. Group maps execution, but it's mostly because of CouchDB view index engine.

1. Legacy view API. Since CouchDB-0.11 release there was introduced `ddoc` command that handles
all design functions except map and reduce. 

Switching from `add_fun`/`add_lib`/`reduce`/`rereduce`  to `ddoc` means that:

1.1 There is no need to use stack of compiled map functions and empty him on each view index
1.2 Reduce function may be cached and use `require` function without any overhead (COUCHDB-1202)
1.3 Since whole cached ddoc on query server side going to be invalidated on every ddoc update,
may be it's matters to implement his partial updates via JSON Patch. 

2. Better logging integration with CouchDB (standalone log file, more logging levels etc);
3. Or may be more rich configuration (COUCHDB-1143);
4. Features exchange on initial handshake;
5. Non-iterative maps.

5.1 Currently, after sending `map_doc` command CouchDB excepts whole results for him from
view server. This means that after receiving 1MiB of JSON, view server generates 200MiB of
map results and pushes all of them to CouchDB with single shot. CouchDB parses them into records,
build B-tree and makes other magic, but I feel that a lot of memory overhead may be reduced
if view server will send map results by small chunks as key-value pairs. 

5.2 View server is not able to parallel "map" multiple docs. With new mrview engine this is
not true anymore and view server is able to "request" (via readline() call, hope you're ready
to not be blocked by him) `map_doc` for multiple times, but still it should return results
in original order or they get messed up. This may be changed if view server response will
contain document id to help CouchDB determine for which document he sent these results are.

6. Missing batch processing. Pushing multiple documents to query server may speedup maps and
validate_doc_updates. However, there CouchDB need to be a bit smart to use batch sending for
small docs, but not for large ones to prevent OOM problems.

About communication between query server and CouchDB.

JSON support is still key feature of everything that deals with CouchDB. JSON via StdIO may
be slow, but it's too easy to implement by whatever language today - this may be leaved unattended.
In additional, CouchDB may provide something better, faster and "native" to him. First thing
that comes to mind is "query server as Erlang node". Why not? Many languages already has libraries
to talk with Erlang:
- Python:
- Ruby:
- Java:
- Go:
- PHP:
- NodeJS:

And I hope others have something similar. So for CouchDB this solution will be ~zero cost
while others will only suffer from lack of fast binary terms codec. 

I feel most part of these problems may be solved without need to rewrite whole protocol from
scratch. Also, it's a good question about "what problems completely new protocol aims to solve"?
One we'd got: improve overall communication speed, any others? Thoughts?
> Make the view server & protocol faster
> --------------------------------------
>                 Key: COUCHDB-1743
>                 URL:
>             Project: CouchDB
>          Issue Type: Improvement
>            Reporter: Dave Cottlehuber
>              Labels: couchdb, erlang, gsoc2013, html, javascript, nodejs, rest
> View server protocol enhancements/refactoring - unix sockets, pipelining, different wire
format etc. Faster!!

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message