couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Filipe Manana (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COUCHDB-1118) Adding a NIF based JSON decoding/encoding module
Date Sun, 03 Apr 2011 20:50:05 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015211#comment-13015211
] 

Filipe Manana commented on COUCHDB-1118:
----------------------------------------

Adam,

The patch I proposed is this one:

https://github.com/fdmanana/couchdb/compare/json_nif

>From all the links I gave before, it's the only one which points to a diff against CouchDB,
completely ready to integrate into trunk.

As for very small documents it's not a problem, sorry I forgot to mention it before. Part
of the work Damien did, was related to performance, not only support for big numbers. Here's
a shell session that shows timings for a document under 300 bytes  (if all white spaces are
removed).

Erlang R14B02 (erts-5.8.3) [source] [smp:2:2] [rq:2] [async-threads:4] [hipe] [kernel-poll:true]

Eshell V5.8.3  (abort with ^G)
1> Apache CouchDB 1.2.0aea00f2a-git (LogLevel=info) is starting.
Apache CouchDB has started. Time to relax.
[info] [<0.37.0>] Apache CouchDB has started on http://127.0.0.1:5984/

1> {ok, Json} = file:read_file("../seatoncouch/doc_100b.json").  
{ok,<<"{\n\"data3\":\"ColYo\",\n\"data5\":{\n        \"nested2\": {\n              
    \"integers\":[756509,116117,776378,275045"...>>}
2>  
2>  byte_size(Json).
361
3> element(1, timer:tc(ejson, decode, [Json])).
2536
4> element(1, timer:tc(ejson, decode, [Json])).
66
5> element(1, timer:tc(ejson, decode, [Json])).
87
6> element(1, timer:tc(ejson, decode, [Json])).
107
7> element(1, timer:tc(ejson, decode, [Json])).
77
8> element(1, timer:tc(ejson, decode, [Json])).
71
9> element(1, timer:tc(ejson, decode, [Json])).
67
10> element(1, timer:tc(ejson, decode, [Json])).
70
11> element(1, timer:tc(ejson, decode, [Json])).
45
12> 
12> element(1, timer:tc(mochijson2, decode, [Json])).
8364
13> element(1, timer:tc(mochijson2, decode, [Json])).
265
14> element(1, timer:tc(mochijson2, decode, [Json])).
324
15> element(1, timer:tc(mochijson2, decode, [Json])).
278
16> element(1, timer:tc(mochijson2, decode, [Json])).
292
17> element(1, timer:tc(mochijson2, decode, [Json])).
291
18> element(1, timer:tc(mochijson2, decode, [Json])).
239
19> element(1, timer:tc(mochijson2, decode, [Json])).
263
20> 
20> EJson = ejson:decode(Json).
{[{<<"data3">>,<<"ColYo">>},
  {<<"data5">>,
   {[{<<"nested2">>,
      {[{<<"integers">>,
         [756509,116117,776378,275045,703447,988947,450154]}]}}]}},
  {<<"data1">>,<<"9EVqHm5ARJPyBY0J">>},
  {<<"more_nested">>,
   {[{<<"nested1">>,
      {[{<<"integers">>,[685803,147958,941747,905651]}]}},
     {<<"nested2">>,{[{<<"integers">>,[756509,116117]}]}}]}}]}
21> 
21> element(1, timer:tc(ejson, encode, [EJson])).
73
22> element(1, timer:tc(ejson, encode, [EJson])).
70
23> element(1, timer:tc(ejson, encode, [EJson])).
65
24> element(1, timer:tc(ejson, encode, [EJson])).
104
25> element(1, timer:tc(ejson, encode, [EJson])).
64
26> element(1, timer:tc(ejson, encode, [EJson])).
75
27> element(1, timer:tc(ejson, encode, [EJson])).
70
28> element(1, timer:tc(ejson, encode, [EJson])).
66
29> 
29> MochiDec = mochijson2:encoder([{handler, fun({L}) when is_list(L) -> {struct, L};
(Bad) -> exit({json_encode, {bad_term, Bad}}) end}]).
#Fun<mochijson2.0.93741038>
30> 
30> element(1, timer:tc(MochiDec, [EJson])).
203
31> element(1, timer:tc(MochiDec, [EJson])).
205
32> element(1, timer:tc(MochiDec, [EJson])).
206
33> element(1, timer:tc(MochiDec, [EJson])).
209
34> element(1, timer:tc(MochiDec, [EJson])).
213
35> element(1, timer:tc(MochiDec, [EJson])).
214
36> element(1, timer:tc(MochiDec, [EJson])).
229
37> 

So even for such small documents, the NIF solution is faster.

> Adding a NIF based JSON decoding/encoding module
> ------------------------------------------------
>
>                 Key: COUCHDB-1118
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1118
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>            Reporter: Filipe Manana
>             Fix For: 1.2
>
>
> Currently, all the Erlang based JSON encoders and decoders are very slow, and decoding
and encoding JSON is something that we do basically everywhere.
> Via IRC, it recently discussed about adding a JSON NIF encoder/decoder. Damien also started
a thread at the development mailing list about adding NIFs to trunk.
> The patch/branch at [1] adds such a JSON encoder/decoder. It is based on Paul Davis'
eep0018 project [2]. Damien made some modifications [3] to it mostly to add support for big
numbers (Paul's eep0018 limits the precision to 32/64 bits) and a few optimizations. I made
a few corrections and minor enhancements on top of Damien's fork as well [4]. Finally BenoƮt
identified some missing capabilities compared to mochijson2 (on encoding, allow atoms as strings
and strings as object properties).
> Also, the version added in the patch at [1] uses mochijson2 when the C NIF is not loaded.
Autotools configuration was adapted to compile the NIF only when we're using an OTP release
>= R13B04 (R13B03 NIF API is too limited and suffered many changes compared to R13B04 and
R14) - therefore it should work on any OTP release > R13B at least.
> I successfully tested this on R13B03, R13B04 and R14B02 in an Ubuntu environment.
> I'm not sure if it builds at all on Windows - would appreciate if someone could verify
it.
> Also, I'm far from being good with the autotools, so I probably missed something important
or I'm doing something in a not very standard way.
> This NIF encoder/decoder is about one order of magnitude faster compared to mochijson2
and other Erlang-only solutions such as jsx. A read and writes test with relaximation shows
this has a very positive impact, specially on reads (the EJSON encoding is more expensive
than JSON decoding) - http://graphs.mikeal.couchone.com/#/graph/698bf36b6c64dbd19aa2bef634052381
> @Paul, since this is based on your eep0018 effort, do you think any other missing files
should be added (README, etap tests, etc)? Also, should we put somewhere a note this is based
on your project?
> [1] - https://github.com/fdmanana/couchdb/compare/json_nif
> [2] - https://github.com/davisp/eep0018
> [3] - https://github.com/Damienkatz/eep0018/commits/master
> [4] - https://github.com/fdmanana/eep0018/commits/final_damien

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message