couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benoit Chesneau <bchesn...@gmail.com>
Subject Re: All The Numbers -- View Engine Performance Benchmarks
Date Mon, 28 Jan 2013 06:19:44 GMT
On Mon, Jan 28, 2013 at 6:37 AM, Jason Smith <jhs@iriscouch.com> wrote:

> Hey, Jan. This is a totally random and hypothetical idea:
>
> Do you think there would be any speedup to use term_to_binary() and
> binary_to_term() instead of encoding through JSON? The view server would of
> course need to support that codec. I have already implemented encoding in
> erlang.js: https://github.com/iriscouch/erlang.js
>
> My suspicion is that there would be minor or zero speedup. The Erlang side
> would get faster (term_to_binary is faster) but the JS side would get
> slower (custom decoding rather than JSON.parse()). The JS VM is slightly
> faster so the total change would reflect that.
>
> But I thought I'd share the idea.
>

The problem is that we actually do:

1. get erlang term
2. serialises it to  a json string (thought it was solved sometimes ago but
some said it wasn't)
3. send json to couchjs
4. js receveives the json, deserialize it to a js object
5. process
6. serialise the result to json string
7. send it back to erlang
8. deserialize the json string
9. encode it to erlang term

Which is pretty inefficient and the main bottleneck here. I'm not sure if
sending erlang term is good or not, actually it would be more efficient
than all these steps. It could be also the other way: saving the json
strring as a blob without handling it.

What about STDIO. There are 2 options here.

Keep the STDIO and improve the concurrency  inside couch. I do think the
query server could be more optimised  we don't use enough of the Erlang
power to distribute messages here and maintain a better  event loop. Also
we should keep more infos in the couchjs process like the ddoc and only
update it on ddoc update.

Or we could eventually move this event loop to the C level and have the
couchjs process acting as a network service. In that case the couchjs
process could listen on a TCP or UNIX socket and react on different events.
It could work on stdio but the advantages here is that you moved some part
f the event loop to the couchjs process and allows more concurrency on this
side: the couchjs process would accept request, and eventually spawn a
thread to manage. Each couchjs worker would be responsible on their own
queue and could at some point refuse more messages or tell to couchdb when
they are ready to accept more (better imo).

Also moving to a network service would allows someone to maintain the
processing appart of couchjs. For example we could imagine to have an
android application doing all the JS evaluation and maintaining this accept
loop instead of couchdb. This would ease considerably the distribution of
couchdb for this platform and allowing the usage of the native threads (the
same for IOS).

This is why it's important for me to work on this protocol update first.
This is our biggest bottleneck. Implementation is another thing, quite a
detail at this point.

- benoƮt




>
> On Sun, Jan 27, 2013 at 12:50 PM, Jan Lehnardt <jan@apache.org> wrote:
>
> >
> > On Jan 27, 2013, at 13:22 , Alexander Shorin <kxepal@gmail.com> wrote:
> >
> > > On Sun, Jan 27, 2013 at 3:55 PM, Jason Smith <jhs@iriscouch.com>
> wrote:
> > >>
> > >> * Very little difference in different implementations (because stdio
> is
> > the
> > >> bottleneck)
> > >
> > > Why stdio is a bottleneck? I'm interesting underlay reasons.
> >
> > It is actually not the the stdio, but the serialisation form erlang-terms
> > to JSON to JS Objects to JSON to erlang terms.
> >
> > Cheers
> > Jan
> > --
> >
> >
> > >
> > > As for my experience, the protocol design doesn't allows view and
> > > query servers works faster as they can. For example, we have 50 ddocs
> > > with validate functions. For each document save there would be
> > > executed from 100 commands (50 resets + 50 ddoc validate_doc_update
> > > calls) till 150 commands (+ddocs caches), while it's possible to
> > > process them in bulk mode.
> > >
> > > --
> > > ,,,^..^,,,
> >
> >
>
>
> --
> Iris Couch
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message