Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E4287E4F4 for ; Fri, 18 Jan 2013 01:47:29 +0000 (UTC) Received: (qmail 98719 invoked by uid 500); 18 Jan 2013 01:47:28 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 98677 invoked by uid 500); 18 Jan 2013 01:47:28 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 98669 invoked by uid 99); 18 Jan 2013 01:47:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Jan 2013 01:47:28 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sean.copenhaver@gmail.com designates 209.85.216.179 as permitted sender) Received: from [209.85.216.179] (HELO mail-qc0-f179.google.com) (209.85.216.179) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Jan 2013 01:47:23 +0000 Received: by mail-qc0-f179.google.com with SMTP id b14so2080135qcs.38 for ; Thu, 17 Jan 2013 17:47:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:date:from:to:message-id:in-reply-to:references:subject :x-mailer:mime-version:content-type; bh=1rCdwKFozLhL5k631SQd1O+s5C5hMwHF8EIL04Y47m8=; b=WSN3f611ydqz/JSfNQWVLzbs7cPYi0/ahxeWkTtblL0diT0309hU5NyoVrTV2gcEEz WhFljKxtLVOroo05o+mCOLX8E8YzFMslwYj+L8hiPxIrmcZzcKXGUX8DrJxCvD4RkVL9 HoRzWV+4a9sVEF5yqeg28hkecDMZG0lhshAp6qum0msLdF456bdLEGR6dJU/6tPHqFNe U9fU0mJ+oI8LS6AlHeBKSiNFj+ahVJtntsDkdJXGYFhzIw131PRww+5/dadlDS8VF2EK 7zxxKHF30cyjX9TlftnqVxi9bfBmm7UuFBQoG9YdmtbKkT27DJc7xP1hlfwctTfJeLgV fsUg== X-Received: by 10.49.48.45 with SMTP id i13mr8529915qen.48.1358473622265; Thu, 17 Jan 2013 17:47:02 -0800 (PST) Received: from [10.0.1.32] (99-98-235-230.lightspeed.clmasc.sbcglobal.net. [99.98.235.230]) by mx.google.com with ESMTPS id z5sm2016406qer.8.2013.01.17.17.46.59 (version=TLSv1 cipher=RC4-SHA bits=128/128); Thu, 17 Jan 2013 17:47:00 -0800 (PST) Date: Thu, 17 Jan 2013 20:46:57 -0500 From: Sean Copenhaver To: user@couchdb.apache.org Message-ID: <692973B6DC2848C3A5BFD268DC73158A@gmail.com> In-Reply-To: References: Subject: Re: general question about couch performance X-Mailer: sparrow 1.6.4 (build 1178) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="50f8a991_542289ec_2a9" X-Virus-Checked: Checked by ClamAV on apache.org --50f8a991_542289ec_2a9 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-Disposition: inline I'm going to assume that you have compacted and CouchDB isn't scrolling through a gig file trying to access docs all over the place. Would need to know what's size your documents are and how much I/O utilization you are seeing on your server, but if you are using non-sequential IDs that'll cause write slow downs as CouchDB will have to keep rewriting b-tree nodes and such instead of appending new ones. Now for IDs I thought this API [1] was for retrieving sequential UUIDs (sounds like at least you can configure it that way) which will help nullify the concern above. Also your data set sounds small enough you might not care about trying to pack in a smaller sequential ID. Also another thing to help with write performance is configuring the 'delayed_commits' [2]. That will always store up documents from individual writes and save them at once to help with write speed. This of course increases the timeframe for data loss if something really bad were to happen. Only other thing I can think of off the top of my head CouchDB related to try is make sure compression is on since that'll help with both reads and writes. CouchDB 1.2 [3] added it and it's on by default but if you haven't upgraded I believe you have to do a compaction once you upgraded the server. Maybe try tweaking OS file system stuff. CouchDB does not do caching I'm aware of relying on the file system to do that. I'll leave with a disclosure that I haven't done anything CouchDB related in while and I never had a project where its performance caused me to investigate. [1] http://wiki.apache.org/couchdb/HttpGetUuids [2] http://wiki.apache.org/couchdb/Configurationfile_couch.ini [3] http://couchdb.readthedocs.org/en/latest/changelog/#id4 -- Sean Copenhaver "Water is fluid, soft and yielding. But water will wear away rock, which is rigid and cannot yield. As a rule, whatever is fluid, soft and yielding will overcome whatever is rigid and hard. This is another paradox: what is soft is strong." - Lao-Tzu On Thursday, January 17, 2013 at 6:44 PM, Mark Hahn wrote: > thx > > > > On Thu, Jan 17, 2013 at 3:29 PM, Daniel Gonzalez wrote: > > > The problem is not replication, the problem is the source of the data. The > > replicators will just distribute the data that is being inserted to other > > server instances. > > > > You can not use that monotonical id generator if you are inserting data > > from different servers or applications. But if you are, let's say, > > importing data to a single couchdb (replication or not) from a third-party > > database in one batch job, you have full control on the IDs, so you can use > > that id generator. That will improve the performance of your database, > > specially in relation to space used and view generation. > > > > On Fri, Jan 18, 2013 at 12:20 AM, Mark Hahn wrote: > > > > > > you can only do this if you are in control of the IDs > > > > > > This wouldn't work with multiple servers replicating, would it? > > > > > > > > > On Thu, Jan 17, 2013 at 3:15 PM, Daniel Gonzalez > > > wrote: > > > > > > > > > > And here you have BaseConverter: > > > > > > > > """ > > > > Convert numbers from base 10 integers to base X strings and back again. > > > > > > > > Sample usage: > > > > > > > > > > > base20 = BaseConverter('0123456789abcdefghij') > > > > > > > base20.from_decimal(1234) > > > > > > > > > > > > > > > > > > > > > > > > > > '31e' > > > > > > > base20.to_decimal('31e') > > > > > > > > > > > > > > > > > > > 1234 > > > > """ > > > > > > > > class BaseConverter(object): > > > > decimal_digits = "0123456789" > > > > > > > > def __init__(self, digits): > > > > self.digits = digits > > > > > > > > def from_decimal(self, i): > > > > return self.convert(i, self.decimal_digits, self.digits) > > > > > > > > def to_decimal(self, s): > > > > return int(self.convert(s, self.digits, self.decimal_digits)) > > > > > > > > def convert(number, fromdigits, todigits): > > > > # Based on http://code.activestate.com/recipes/111286/ > > > > if str(number)[0] == '-': > > > > number = str(number)[1:] > > > > neg = 1 > > > > else: > > > > neg = 0 > > > > > > > > # make an integer out of the number > > > > x = 0 > > > > for digit in str(number): > > > > x = x * len(fromdigits) + fromdigits.index(digit) > > > > > > > > # create the result in base 'len(todigits)' > > > > if x == 0: > > > > res = todigits[0] > > > > else: > > > > res = "" > > > > while x > 0: > > > > digit = x % len(todigits) > > > > res = todigits[digit] + res > > > > x = int(x / len(todigits)) > > > > if neg: > > > > res = '-' + res > > > > return res > > > > convert = staticmethod(convert) > > > > > > > > > > > > On Fri, Jan 18, 2013 at 12:13 AM, Daniel Gonzalez < > > gonvaled@gonvaled.com (mailto:gonvaled@gonvaled.com) > > > > > wrote: > > > > > > > > > > > > > Also, in order to improve view performance, it is better if you use a > > > > > short and monotonically increasing id: this is what I am using for > > > > > > > > > > > > > > > > > > > > one > > > of > > > > > my databases with millions of documents: > > > > > > > > > > class MonotonicalID: > > > > > > > > > > def __init__(self, cnt = 0): > > > > > self.cnt = cnt > > > > > self.base62 = > > > > > > > > > > > > > > > > > > > BaseConverter('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz') > > > > > # This alphabet is better for couchdb, since it represents > > > > > > > > > > > the > > > > > Unicode Collation Algorithm > > > > > self.base64_couch = > > > > > > > > > > > > > > > > BaseConverter('-@0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ') > > > > > > > > > > def get(self): > > > > > res = self.base64_couch.from_decimal(self.cnt) > > > > > self.cnt += 1 > > > > > return res > > > > > > > > > > Doing this will: > > > > > - save space in the database, since the id starts small: take into > > > > > > > > > > > > > account > > > > > that the id is used in lots of internal data structures in couchdb, > > > > > > > > > > > > > > > so > > > > > making it short will save lots of space in a big database > > > > > - making it ordered (in the couchdb sense) will speed up certain > > > > > > > > > > > > > operations > > > > > > > > > > Drawback: you can only do this if you are in control of the IDs (you > > > know > > > > > that nobody else is going to be generating IDs) > > > > > > > > > > On Thu, Jan 17, 2013 at 8:00 PM, Mark Hahn wrote: > > > > > > > > > > > Thanks for the tips. Keep them coming. > > > > > > > > > > > > I'm going to try everything I can. If I find anything surprising > > I'll > > > > let > > > > > > everyone know. > > > > > > > > > > > > > > > > > > On Thu, Jan 17, 2013 at 4:54 AM, Daniel Gonzalez < > > > gonvaled@gonvaled.com (mailto:gonvaled@gonvaled.com) > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Are you doing single writes or batch writes? > > > > > > > I managed to improve the write performance by collecting the > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > documents > > > > > > and > > > > > > > sending them in a single access. > > > > > > > The same applies for read accesses. > > > > > > > > > > > > > > On Wed, Jan 16, 2013 at 9:17 PM, Mark Hahn > > wrote: > > > > > > > > > > > > > > > My couchdb is seeing a typical request rate of about 100/sec > > when > > > it > > > > > > is > > > > > > > > maxed out. This is typically 10 reads/write. This is > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > disappointing. > > > > > > I > > > > > > > > was hoping to 3 to 5 ms per op, not 10 ms. What performance > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > numbers > > > > > > are > > > > > > > > others seeing? > > > > > > > > > > > > > > > > I have 35 views with only 50 to 100 entries per view. My db is > > > less > > > > > > > than a > > > > > > > > gigabyte with a few thousand active docs. > > > > > > > > > > > > > > > > I'm running on a medium ec2 instance with ephemeral disk. I > > > assume > > > > I > > > > > > am > > > > > > > IO > > > > > > > > bound as the cpu is not maxing out. > > > > > > > > > > > > > > > > How much worse would this get if the db also had to handle > > > > replication > > > > > > > > between multiple servers? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --50f8a991_542289ec_2a9--