Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 28174 invoked from network); 12 Jan 2010 21:40:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Jan 2010 21:40:18 -0000 Received: (qmail 31748 invoked by uid 500); 12 Jan 2010 21:40:15 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 31662 invoked by uid 500); 12 Jan 2010 21:40:15 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 31505 invoked by uid 99); 12 Jan 2010 21:40:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Jan 2010 21:40:15 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Jan 2010 21:40:14 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 7C06C234C495 for ; Tue, 12 Jan 2010 13:39:54 -0800 (PST) Message-ID: <301780794.192061263332394506.JavaMail.jira@brutus.apache.org> Date: Tue, 12 Jan 2010 21:39:54 +0000 (UTC) From: "Paul Joseph Davis (JIRA)" To: dev@couchdb.apache.org Subject: [jira] Updated: (COUCHDB-620) Generating views is extremely slow - makes CouchDB hard to use with non-trivial number of docs In-Reply-To: <1376854040.148511263187674592.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/COUCHDB-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Joseph Davis updated COUCHDB-620: -------------------------------------- Attachment: pipelining.jpg Chris responded on dev@ with: > The couchjs view server protocol is strictly line based. Each line is > parsed to JSON after it is received. Then the computation is done, and > a line of JSON is returned. > > Since the couchjs server is single threaded it hasn't made much sense > to make the protocol more complex. I'd just point out that pipelining is not the same as threading. I'm pretty certain we could pipeline from the Erlang side without affecting the protocol (assuming that responses are yielded in exactly the same order as requests were provided). All pipelining does is allow for streaming data through the process. A picture can probably explain lots better than any wording so I'm attaching a whiteboard diagram. > Generating views is extremely slow - makes CouchDB hard to use with non-trivial number of docs > ---------------------------------------------------------------------------------------------- > > Key: COUCHDB-620 > URL: https://issues.apache.org/jira/browse/COUCHDB-620 > Project: CouchDB > Issue Type: Improvement > Components: Infrastructure > Affects Versions: 0.10 > Environment: Ubuntu 9.10 64 bit, CouchDB 0.10 > Reporter: Roger Binns > Assignee: Damien Katz > Attachments: pipelining.jpg > > > Generating views is extremely slow. For example adding 10 million documents takes less than 10 minutes but generating some simple views on the same docs takes over 4 hours. > Using top you can see that CouchDB (erlang) and couchjs between them cannot even saturate a single CPU let alone the I/O system. Under ideal conditions performance should be limited by cpu, disk or memory. This implies that the processes are doing simple things in lockstep accumulating latencies in each process as well as the communication between them which when multiplied by the number of documents can amount to a lot. > Some suggestions: > * Run as many couchjs instances as there are processor cores and scatter work amongst them > * Have some sort of pipelining in the erlang so that the moment the first byte of response is received from couchjs the data is sent for the next request (the JSON conversion, HTTP headers etc should all have been assembled already) to reduce latencies. Do whatever is most similar in couchjs (eg use separate threads to read requests, process them and write responses). > * Use the equivalent of HTTP pipelining when talking to couchjs so that it always has a doc ready to work on rather than having to transmit an entire response and then wait for erlang to think and provide an entire new request > A simple test of success is to have a database with a million or so documents with a trivial view and have view creation max out the CPU,. memory or disk. > Some things in CouchDB make this a particularly nasty problem. View data is not replicated so replicating documents can lead the view data by a large margin on the recipient database. This can lead to inconsistencies. You also can't expect users to then wait minutes (or hours) for a request to complete because the view generation got that far behind. (My own plans now are to not use replication and instead create the database file on another couchdb instance and then rsync the binary database file over instead!) > Although stale=ok is available, you still have no idea if the response will be quick or take however long view generation does. (Sure I could add some sort of timeout and complicate the code but then what value do I pick? If I have a user waiting I want an answer ASAP or I have to give them some horrible error message. Taking a long wait and then giving a timeout is even worse!) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.