couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: GSoc 2015 | COUCHDB-1743 Make the view server & protocol faster
Date Wed, 25 Mar 2015 20:34:25 GMT

> On 25 Mar 2015, at 20:00, Alexander Shorin <kxepal@gmail.com> wrote:
> 
> On Wed, Mar 25, 2015 at 5:53 PM, Buddhika Jayawardhana
> <buddhika.12@cse.mrt.ac.lk> wrote:
>> Thank you so much for pointing out weaknesses of my proposal. Yes, we
>> should measure the performance of the view server. Can we do a comparison
>> test between the Old functions with corresponding new function? or should I
>> research about a tool? It would be really great If devs can suggest any
>> directions for research.
> 
> Yes, we need to compare behaviour of old implementation with a new one
> for the exact the same cases. Regular test suite isn't a good
> candidate for such role since it solves a different problem, but here
> we have a two targets: Query Server (JS) and CouchDB (Erlang). During
> your work you may really improve Query Server performance, but
> accidentally reduce it for Erlang side. There is a field for a small
> researching about, but it shouldn't take a lot of time.
> 
>> About Node.js, I did not research about any other run-time environment.
>> Since Jan already has done some experiments, I though It is reasonable to
>> using Node.js.
> 
> I didn't make a research either, but quick test of the current query
> server for SpiderMonkey vs NodeJS shows not much difference between.
> 
>> Jan's point was that more people will get the opportunity contribute if we
>> move in to Node.js. Can we try to Implement a streaming communication
>> between Erlang and Couch.js ? then we can make 'move in to Node.js'
>> optional. I would like to know the opinion of other member.
> 
> I like the idea of separating the things. What is really needs for
> improvement is the query server protocol, no doubts. Once it'll be
> changed and proved as faster, then it'll be a time for researching of
> SM replacement with NodeJS/io.js. These are two separate goals for me,
> but their order is quite clear: to be able compare something, there is
> need to create equivalent environment and conditions for them and the
> protocol is a part of that.

They are separate goals, but it would be exceptionally hard to build
a streaming protocol in couchjs. It very likely would require significant
work in C/C++ (unless I’m missing something obvious). That’s why I
think a new protocol and a new JS interpreter is going hand-in-hand.

As far as whether Node.js is a good target, I’d say don’t worry about
it too much. I can’t predict where the Node.js/io.js dichotomy is going
to end, but once the new query server is in place, it isn’t very hard
to generally support both or however many environments.

As a repetition, one of the reasons why I want Node.js in there is that
we can drop dependency on the C glue that is couchjs and attract *a lot*
more people that can work on CouchDB on that level. We haven’t seen
much work there over the years once things settled in and I’m not eager
to go in there voluntarily, I can imagine newcomers to be eager to go
anywhere near.

* * *

As for the performance benchmarking, which I agree should be part of
this, I think we can keep in simple. Benchmarking any particular
components helps less. What we are interested in is whether view
index builds are faster and whether we make better use of hardware.

To that end, I’d propose a benchmark suite that roughly has this:

- a bunch of databases with differently sized documents ranging
  from small (50 bytes maybe) to medium (1 kbytes) and large.
  (10 kbytes).
- a bunch of different typical map/reduce functions.

Then we simply measure how long everything takes with either
view engine, and how much CPU/RAM/disk is used either way.
Wiring this up shouldn’t take too long. It’d be nice to have
this a separate module that we can keep using for future
performance work.

And then:

- a bunch of typical show/list/update/validate/filter functions

And we’ll run a simple test suite that runs a bunch of concurrent
requests against them and then measure response time, number of
concurrent requests until problems occur, how many OS processes
are used, and again CPU/RAM/disk etc. One of the myriad web
benchmark tools should suffice.

* * *

Buddhika, I read through your proposal draft and only had minor
edits for now. You said it is light on technical details. In
which area would you want to expand on technical details? I’m
happy to help to fill in the blanks.

Best
Jan

--
Professional Support for Apache CouchDB:
http://www.neighbourhood.ie/couchdb-support/


Mime
View raw message