couchdb-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Couchdb Wiki] Trivial Update of "Performance" by AndrewCooper
Date Sat, 09 Oct 2010 20:31:59 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Couchdb Wiki" for change notification.

The "Performance" page has been changed by AndrewCooper.
The comment on this change is: sp.
http://wiki.apache.org/couchdb/Performance?action=diff&rev1=7&rev2=8

--------------------------------------------------

  If you have a fast I/O system then you can also use concurrency - have multiple requests/responses
at the same time.  This mitigates the latency involved in assembling JSON, doing the networking
and decoding JSON.
  
  = View generation =
- Views with the Javascript view server (default) are extremely slow to generate when there
are a non-trivial number of documents to process.  The generation process won't even saturate
a single CPU let alone your I/O.  The cause is the latency involved in the CouchDB server
and seperate couchjs view server, drammatically indicating how important it is to take latency
out of your implementation.
+ Views with the Javascript view server (default) are extremely slow to generate when there
are a non-trivial number of documents to process.  The generation process won't even saturate
a single CPU let alone your I/O.  The cause is the latency involved in the CouchDB server
and seperate couchjs view server, dramatically indicating how important it is to take latency
out of your implementation.
  
  You can let view access be "stale" but it isn't practical to determine when that will occur
giving you a quick response and when views will be updated which will take a long time.  (A
10 million document database took about 10 minutes to load into CouchDB but about 4 hours
to do view generation.)
  
  View information isn't replicated - it is rebuilt on each database so you can't do the view
generation on a separate sever.  The only useful mechanism I have found is to generate the
view on a separate machine together with data updates, shut down your target server, copy
the couchdb raw database file across and then restart the target server.
  
  == Erlang implementations of common JavaScript functions ==
- 
- If you’re using a very simple view function that only performs a sum or count reduction,
you can call native Erlang implementations of them by simply writing "_sum" or "_count" in
place of your function declaration. This will speed up things dramatically, as it cuts down
on IO between CouchDB and serverside JavaScript. For example, as [http://mail-archives.apache.org/mod_mbox/couchdb-user/201003.mbox/%3c5E07E00E-3D69-4A8C-ADA3-1B20CF0BA4C8@julianstahnke.com%3e
mentioned on the mailing list], the time for outputting an (already indexed and cached) view
with about 78,000 items went down from 60 seconds to 4 seconds.
+ If you’re using a very simple view function that only performs a sum or count reduction,
you can call native Erlang implementations of them by simply writing "_sum" or "_count" in
place of your function declaration. This will speed up things dramatically, as it cuts down
on IO between CouchDB and serverside JavaScript. For example, as [[[http://mail-archives.apache.org/mod_mbox/couchdb-user/201003.mbox/<5E07E00E-3D69-4A8C-ADA3-1B20CF0BA4C8@julianstahnke.com>|http://mail-archives.apache.org/mod_mbox/couchdb-user/201003.mbox/%3c5E07E00E-3D69-4A8C-ADA3-1B20CF0BA4C8@julianstahnke.com%3e]]
mentioned on the mailing list], the time for outputting an (already indexed and cached) view
with about 78,000 items went down from 60 seconds to 4 seconds.
  
  Example:
  
  Before:
  
-  {{{#!javascript
+  . {{{#!javascript
  {
      "_id": "_design/foo",
      "views": {
@@ -48, +47 @@

  
  After:
  
-  {{{#!javascript
+  . {{{#!javascript
  {
      "_id": "_design/foo",
      "views": {
@@ -70, +69 @@

  
  = Resource Limits =
  One of the problems that administrators run into as their deployments become large are resource
limits imposed by the system and by the application configuration. Raising these limits can
allow your deployment to grow beyond what the default configuration will support.
+ 
  == CouchDB Configuration Options ==
- In your configuration (local.ini or similar) familiarize yourself with the following options:{{{
+ In your configuration (local.ini or similar) familiarize yourself with the following options:
+ 
+ {{{
  [couchdb]
  max_dbs_open = 100
  
  [httpd]
- max_connections = 2048}}}
+ max_connections = 2048
- The first option places an upper bound on the number of databases that can be open at one
time. CouchDB reference counts database accesses internally and will close idle databases
when it must. Sometimes it is necessary to keep more than the default open at once, such as
in deployments where many databases will be continuously replicating.
- The second option limits how many client connections the HTTP server will service at a time.
Again, heavy replication scenarios are good candidates for increased {{{max_connections}}}
since the replicator opens several connections to the source database.
+ }}}
+ The first option places an upper bound on the number of databases that can be open at one
time. CouchDB reference counts database accesses internally and will close idle databases
when it must. Sometimes it is necessary to keep more than the default open at once, such as
in deployments where many databases will be continuously replicating. The second option limits
how many client connections the HTTP server will service at a time. Again, heavy replication
scenarios are good candidates for increased {{{max_connections}}} since the replicator opens
several connections to the source database.
+ 
  == System Resource Limits ==
  === Erlang ===
- Even if you've increased the maximum connections CouchDB will allow, the Erlang runtime
system will not allow more than 1024 connections by default. Adding the following directive
to {{{(prefix)/etc/default/couchdb}}} (or equivalent) will increase this limit (in this case
to 4096):{{{
+ Even if you've increased the maximum connections CouchDB will allow, the Erlang runtime
system will not allow more than 1024 connections by default. Adding the following directive
to {{{(prefix)/etc/default/couchdb}}} (or equivalent) will increase this limit (in this case
to 4096):
+ 
+ {{{
- export ERL_MAX_PORTS=4096}}}
+ export ERL_MAX_PORTS=4096
+ }}}
  === PAM and ulimit ===
- Finally, most *nix operating systems impose various resource limits on every process. If
your system is set up to use the Pluggable Authentication Modules (PAM) system, increasing
this limit is straightforward. For example, creating a file named {{{/etc/security/limits.d/100-couchdb.conf}}}
with the following contents will ensure that CouchDB can open enough file descriptors to service
your increased maximum open databases and Erlang ports:{{{
+ Finally, most *nix operating systems impose various resource limits on every process. If
your system is set up to use the Pluggable Authentication Modules (PAM) system, increasing
this limit is straightforward. For example, creating a file named {{{/etc/security/limits.d/100-couchdb.conf}}}
with the following contents will ensure that CouchDB can open enough file descriptors to service
your increased maximum open databases and Erlang ports:
+ 
+ {{{
  #<domain>    <type>    <item>    <value>
  couchdb      hard      nofile    4096
- couchdb      soft      nofile    4096}}}
+ couchdb      soft      nofile    4096
+ }}}
- If your system does not use PAM, a {{{ulimit}}} command is usually available for use in
a custom script to launch CouchDB with increased resource limits.
+ If your system does not use PAM, a {{{ulimit}}} command is usually available for use in
a custom script to launch CouchDB with increased resource limits. If necessary, feel free
to increase this limits as long as your hardware can handle the load.
- If necessary, feel free to increase this limits as long as your hardware can handle the
load.
  
  = Disk and File System Performance =
- Using faster disks, striped RAID arrays and modern file systems can all speed up your CouchDB
deployment. However, there is one option that can increase the responsiveness of your CouchDB
server when disk performance is a bottleneck. From the erlang documentation for the file module:
{{{
+ Using faster disks, striped RAID arrays and modern file systems can all speed up your CouchDB
deployment. However, there is one option that can increase the responsiveness of your CouchDB
server when disk performance is a bottleneck. From the erlang documentation for the file module:
- On operating systems with thread support, it is possible to let file operations be performed
in threads of their own, allowing other Erlang processes to continue executing in parallel
with the file operations. See the command line flag +A in erl(1).}}}
- Setting this argument to a number greater than zero can keep your CouchDB installation responsive
even during periods of heavy disk utilization. The easiest way to set this option is through
the {{{ERL_FLAGS}}} environment variable. For example, to give Erlang four threads with which
to perform i/o operations add the following to {{{(prefix)/etc/defaults/couchdb}}} (or equivalent):
{{{
- export ERL_FLAGS="+A 4"}}}
  
+ {{{
+ On operating systems with thread support, it is possible to let file operations be performed
in threads of their own, allowing other Erlang processes to continue executing in parallel
with the file operations. See the command line flag +A in erl(1).
+ }}}
+ Setting this argument to a number greater than zero can keep your CouchDB installation responsive
even during periods of heavy disk utilization. The easiest way to set this option is through
the {{{ERL_FLAGS}}} environment variable. For example, to give Erlang four threads with which
to perform i/o operations add the following to {{{(prefix)/etc/defaults/couchdb}}} (or equivalent):
+ 
+ {{{
+ export ERL_FLAGS="+A 4"
+ }}}
+ 

Mime
View raw message