couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Cottlehuber (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COUCHDB-1334) Indexer speedup (for non-native view servers)
Date Wed, 12 Dec 2012 21:31:22 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530355#comment-13530355
] 

Dave Cottlehuber commented on COUCHDB-1334:
-------------------------------------------

I'm in agreement with Filipe, this is some weird Windows breakage.

Including overlapped_io  per http://www.erlang.org/doc/man/erlang.html is not sufficient.

https://github.com/erlang/otp/blob/OTP_R15B03/erts/emulator/sys/win32/sys.c#L393 and https://github.com/erlang/otp/blob/OTP_R15B03/erts/emulator/sys/win32/sys.c#L2246
scare me.

The following patch (enabling overlapped_io) on Erlang side passes 1.2.0 tests and 1.3.x branch
with the patch reverted - i.e. so far this looks safe.

diff --git i/src/couchdb/couch_os_process.erl w/src/couchdb/couch_os_process.erl
index db62d49..d5ef857 100644
--- i/src/couchdb/couch_os_process.erl
+++ w/src/couchdb/couch_os_process.erl
@@ -20,7 +20,7 @@
 
 -include("couch_db.hrl").
 
--define(PORT_OPTIONS, [stream, {line, 4096}, binary, exit_status, hide]).
+-define(PORT_OPTIONS, [stream, {line, 4096}, binary, exit_status, hide, overlapped_io]).
 
 -record(os_proc,
     {command,

However that's not enough for this to work with the full patch in place.

A work-around to avoid the specific issue for Windows didn't pan out in time for the release
timeframe:

diff --git a/src/couchdb/couch_os_process.erl b/src/couchdb/couch_os_process.erl
index 3a267be..5f45f5f 100644
--- a/src/couchdb/couch_os_process.erl
+++ b/src/couchdb/couch_os_process.erl
@@ -58,6 +58,14 @@ prompt(Pid, Data) ->
     end.
 
 prompt_many(Pid, DataList) ->
+    case os:type() of
+    {win32, _} ->
+        lists:map(fun(Data) -> prompt(Pid, Data) end, DataList);
+    _ ->
+        do_prompt_many(Pid, DataList)
+    end.
+
+do_prompt_many(Pid, DataList) ->
     OsProc = gen_server:call(Pid, get_os_proc, infinity),
     true = port_connect(OsProc#os_proc.port, self()),
     try

                
> Indexer speedup (for non-native view servers)
> ---------------------------------------------
>
>                 Key: COUCHDB-1334
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1334
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core, JavaScript View Server, View Server Support
>            Reporter: Filipe Manana
>            Assignee: Dave Cottlehuber
>             Fix For: 1.3
>
>         Attachments: 0001-More-efficient-view-updater-writes.patch, 0002-More-efficient-communication-with-the-view-server.patch,
master-0002-More-efficient-communication-with-the-view-server.patch, master-2-0002-More-efficient-communication-with-the-view-server.patch,
master-3-0002-More-efficient-communication-with-the-view-server.patch, master-4-0002-More-efficient-communication-with-the-view-server.patch
>
>
> The following 2 patches significantly improve view index generation/update time and reduce
CPU consumption.
> The first patch makes the view updater's batching more efficient, by ensuring each btree
bulk insertion adds/removes a minimum of N (=100) key/value pairts. This also makes the index
file size grow not so fast with old data (old btree nodes basically). This behaviour is already
done in master/trunk in the new indexer (by Paul Davis).
> The second patch maximizes the throughput with an external view server (such as couchjs).
Basically it makes the pipe (erlang port) communication between the Erlang VM (couch_os_process
basically) and the view server more efficient since the 2 sides spend less time block on reading
from the pipe.
> Here follow some benchmarks.
> test database at  http://fdmanana.iriscouch.com/test_db  (1 million documents)
> branch 1.2.x
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m45.097s
> user	0m0.006s
> sys	0m0.007s
> view file size: 333Mb
> CPU usage:
> $ sar 1 60
> 22:27:20  %usr  %nice   %sys   %idle
> 22:27:21   38      0     12     50
> (....)
> 22:28:21   39      0     13     49
> Average:     39      0     13     47   
> branch 1.2.x + batch patch (first patch)
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	2m12.736s
> user	0m0.006s
> sys	0m0.005s
> view file size 72Mb
> branch 1.2.x + batch patch + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m9.330s
> user	0m0.006s
> sys	0m0.004s
> view file size:  72Mb
> CPU usage:
> $ sar 1 60
> 22:22:55  %usr  %nice   %sys   %idle
> 22:23:53   22      0      6     72
> (....)
> 22:23:55   22      0      6     72
> Average:     22      0      7     70   
> master/trunk
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	1m57.296s
> user	0m0.006s
> sys	0m0.005s
> master/trunk + os_process patch
> $ echo 3 > /proc/sys/vm/drop_caches
> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
> {"rows":[
> {"key":null,"value":1000000}
> ]}
> real	0m53.768s
> user	0m0.006s
> sys	0m0.006s

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message