Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5F588200A5B for ; Wed, 25 May 2016 12:45:13 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 5E37A160A18; Wed, 25 May 2016 10:45:13 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7EAB0160A17 for ; Wed, 25 May 2016 12:45:12 +0200 (CEST) Received: (qmail 16250 invoked by uid 500); 25 May 2016 10:45:06 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 16237 invoked by uid 99); 25 May 2016 10:45:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 May 2016 10:45:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id D0A58C0DDB for ; Wed, 25 May 2016 10:45:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.28 X-Spam-Level: * X-Spam-Status: No, score=1.28 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=medicmobile-org.20150623.gappssmtp.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id IosG49j_N_Ve for ; Wed, 25 May 2016 10:45:03 +0000 (UTC) Received: from mail-wm0-f53.google.com (mail-wm0-f53.google.com [74.125.82.53]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 4FA065FAD2 for ; Wed, 25 May 2016 10:45:03 +0000 (UTC) Received: by mail-wm0-f53.google.com with SMTP id n129so57339849wmn.1 for ; Wed, 25 May 2016 03:45:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=medicmobile-org.20150623.gappssmtp.com; s=20150623; h=from:message-id:mime-version:subject:date:references:to:in-reply-to; bh=dOMpR2wFGimFasSMH0I+JhoHcy09fVjbr0nnXbIElD8=; b=IWjSnf0741/tPyZJUXogvcDGAQEOjIgEEPSLptR5sfGps8DQuGFAAJi8pM1RjnCvW+ ovTkrXeU+FCiSC2AjSofjUhHhdA7LOnDIWL5SZkl100di+eFbMPrIg2t/DC3bgitd+ts 9OqJUtW8hEg2Vedk/0s+oJZqkGiPpGFfTGQbuK0XusMKWhBAUSpYM/IHhQRqfFonbDPX G9jEsPRaQ4m+Mm8NhfNc6musllCo4rly7aT83NiNXR5GrGZEM9Aufj9CmcTGjNOdI4/s Sz1C7uLOOc3VWr/dAd+yCTE7BsdGoY2mCeNUfkJzpGTTuAsy3lCfj8+eVowqXxQz/req NY/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:message-id:mime-version:subject:date :references:to:in-reply-to; bh=dOMpR2wFGimFasSMH0I+JhoHcy09fVjbr0nnXbIElD8=; b=FlkD8pBdqdTFXeOZTJH+zQyRFlzGwt5xHJvPYfkqRyr0QS8NfKk+JYSC4U/GIyiQIY PjfWpCtDDRHy6KMeJjaqC7ZporE6jH5nTgyGMtt64naUFE3hrDjNTNuFLo+ydH2VbDbz d1US+A9wij9wCuVoMgywVnjkdoTIIGT7tFaiNxm2r2VPs4eC8Ov9LGxLwrMX7BQkRSH4 wGelypJQZch1hA0lqBe58H42cw+teQllE+wfmtQ+CUyFLdVLtN2apiR9tKQGOR79k2q0 OnstkWdV0THxeCitzduw8QK4uuMGNC0x1WeelJIeh8BUBLXsHxzkTtg2ZNB+We8f/Bdu +Uxw== X-Gm-Message-State: ALyK8tIBnxYksnq69Rnw2abkvNhlSUx+7qZoEqTKCcGwwqvHZ5PIscYvyD5DrR7ospT3fQ== X-Received: by 10.194.118.195 with SMTP id ko3mr3564879wjb.178.1464173101964; Wed, 25 May 2016 03:45:01 -0700 (PDT) Received: from [192.168.0.2] (cpc99936-brnt1-2-0-cust559.4-2.cable.virginm.net. [82.5.22.48]) by smtp.gmail.com with ESMTPSA id o4sm8052104wjx.45.2016.05.25.03.45.00 for (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 25 May 2016 03:45:00 -0700 (PDT) From: Stefan du Fresne Content-Type: multipart/alternative; boundary="Apple-Mail=_ECA32DA9-A419-4923-9015-99BEB26C4506" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: The state of filtered replication Date: Wed, 25 May 2016 11:45:00 +0100 References: <78765234-9DCE-4D77-8267-18342D494051@medicmobile.org> To: user@couchdb.apache.org In-Reply-To: X-Mailer: Apple Mail (2.3124) archived-at: Wed, 25 May 2016 10:45:13 -0000 --Apple-Mail=_ECA32DA9-A419-4923-9015-99BEB26C4506 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi Simon, It=E2=80=99s good to hear we are not the only people to have this = problem, and that we=E2=80=99re following in the footsteps of others :-) More databases is definitely an option, but it=E2=80=99s one we=E2=80=99re= trying to avoid, partially to keep the budget under control, and = partially because we=E2=80=99d be constantly scaling up and down, as = most of our performance concerns are just when we=E2=80=99re on = boarding, so it=E2=80=99s a lot of extra complexity. Unfortunately (2) isn=E2=80=99t an option for us either: our PouchDB = clients are on slow phones with slow, flakey and expensive network = connections (think health workers in remote parts of Uganda) and so = reducing what we send them to the bare minimum is very important. We = also shouldn=E2=80=99t really send them other people=E2=80=99s data=E2=80=94= even to have Pouch then filter it out=E2=80=94 for privacy reasons. Stefan > On 25 May 2016, at 09:55, Sinan Gabel wrote: >=20 > Hi Stefan, >=20 > I recognise your description and problem: I also gave up on the = server-side > performance. With 1.6.1 version of CouchDB I only saw two immediate = options: >=20 > (1) More databases on the server-side to reduce the number of docs per > database > (2) Simply do the filtering on the client-side in PouchDB, this is = actually > quite fast and robust: Here experiment with best settings of options: > *batch_size* and *timeout*. >=20 > For (2) possibly combine with: = https://github.com/nolanlawson/worker-pouch = > if there are a lot of documents >=20 >=20 > ... however it would be best with a much faster "production-made" > server-side filtering opportunity in CouchDB 2.x. >=20 >=20 > Br, > Sinan >=20 > On 25 May 2016 at 10:34, Stefan du Fresne > wrote: >=20 >> Hello all, >>=20 >> I work on an app that involves a large amount of CouchDB filtered >> replication (every user has a filtered subset of the DB locally via >> PouchDB). Currently filtered replication is our number 1 performance >> bottleneck for rolling out to more users, and I'm trying to work out = where >> we can go from here. >>=20 >> Our current setup is one CouchDB database and N PouchDB = installations, >> which all two-way replicate, with the CouchDB->PouchDB replication = being >> filtered based on user permissions / relevance [1]. >>=20 >> Our issue is that as we add users a) total document creation velocity >> increases, and b) the proportion of documents that are relevant to = any >> particular user decreases. These two points cause replication-- both >> initial onboarding and continual-- to take longer and longer. >>=20 >> At this stage we are being forced to manually limit the number of = users we >> onboard at any particular time to half a dozen or so, or risk CouchDB = being >> unresponsive [2]. As we'd want to be onboarding 50-100 at any = particular >> time due to how we're rolling pit, you can imagine that this is = pretty >> painful. >>=20 >> I have already re-written the filter in Erlang, which halved its = execution >> time, which is awesome! >>=20 >> I also attempted to simplify the filter to increase performance. = However, >> filter speed seems more dependent on the physical size of your filter = as >> opposed to what code executes, which makes writing a simple filter = that can >> fall-back to a complicated filter not terribly useful (see: >> https://issues.apache.org/jira/browse/COUCHDB-3021 < >> https://issues.apache.org/jira/browse/COUCHDB-3021 = >) >>=20 >> If the above linked ticket is fixed (if it can be) this would make = our >> filter 3-4x faster again. However, this still wouldn't address the >> fundamental issue that filtered replication is very CPU-intensive, = and so >> as noted above doesn't seem to scale terribly well. >>=20 >> Ideally then, I would like to remove filter replication completely, = but >> there does not seem to be a good alternative right now. >>=20 >> Looking through the archives there was talk of adding view = replication, >> see: >> = https://mail-archives.apache.org/mod_mbox/couchdb-user/201307.mbox/%3CCAJN= b-9pK4CVRHNwr83_DXCn%2B2_CZXgwDzbK3m_G2pdfWjSsFMA%40mail.gmail.com%3E = >> < >> = https://mail-archives.apache.org/mod_mbox/couchdb-user/201307.mbox/%3CCAJN= b-9pK4CVRHNwr83_DXCn%2B2_CZXgwDzbK3m_G2pdfWjSsFMA%40mail.gmail.com%3E = > >> , but it doesn't look like this ever got resolved. >>=20 >> There is also often talk of databases per user being a good scaling >> strategy, but we're basically doing that already (with PouchDB), and = for >> us documents aren't owned / viewed by just one person so this does = not get >> us away from filtered replication (eg a supervisor replicates her = documents >> as well as N sub-users documents). There are potentially wild and = crazy >> schemes that involves many different databases where the equivalent = of >> filtering is expressed in replication relationships, but this would = add a >> massive amount of complexity to our app, and I=E2=80=99m not even = convinced it >> would work as there are lots of edge cases to consider. >>=20 >> Does anyone know of anything else I can try to increase replication >> performance? Or to safeguard against many replicators unacceptably >> degrading couchdb performance? Does Couch 2.0 address any of these = concerns? >>=20 >> Thanks in advance, >> - Stefan du Fresne >>=20 >> [1] security is handled by not exposing couch and going through a = wrapper >> service that validates couch requests, relevance is hierarchy based = (i.e. >> documents you or your subordinates are authors of are replicated to = you) >> [2] there are also administrators / configurers that access couchdb >> directly --Apple-Mail=_ECA32DA9-A419-4923-9015-99BEB26C4506--