From dev-return-48451-archive-asf-public=cust-asf.ponee.io@couchdb.apache.org Mon Mar 18 18:42:44 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 0ACD3180651 for ; Mon, 18 Mar 2019 19:42:43 +0100 (CET) Received: (qmail 47448 invoked by uid 500); 18 Mar 2019 18:42:42 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 47435 invoked by uid 99); 18 Mar 2019 18:42:42 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Mar 2019 18:42:42 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id D4F84C259A for ; Mon, 18 Mar 2019 18:42:41 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.798 X-Spam-Level: * X-Spam-Status: No, score=1.798 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id ls6FSRpgDJ9u for ; Mon, 18 Mar 2019 18:42:39 +0000 (UTC) Received: from mail-qk1-f196.google.com (mail-qk1-f196.google.com [209.85.222.196]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 533D760F75 for ; Mon, 18 Mar 2019 18:42:38 +0000 (UTC) Received: by mail-qk1-f196.google.com with SMTP id z13so10288900qki.2 for ; Mon, 18 Mar 2019 11:42:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=+08LnZM6gTHJVnU8PijwxRuY4BymiddtQOqS27adRFs=; b=q8IlG5juyaVaw1KztgFlT5UwksGxIPztsQhsgIo1mnF9ZRk9jXISCVjQtpf8/ZKf7U st4uNd3d5YPU+vvVaRI8yedso/udEbSFW2gOhAAoM0ghaJWYbJ8o8nTuB2OrfHTj/pjj MbiZWgRsbvCpbsOknKPcfpnFHxKu6MZPvM8p/3SzgUyF1WeLSgCnWtB1okOi/SLfBfYe Z1Op4WhrvOtOPh4Po21EMdcGeJ+x48ukVs5ZcM7w2c2H+FnRuiQIgWKb/QTo/H0tTpyJ Rtt+Qt0n2f8o+yWkRyCAi/o0PAsQTg5R/lCkoAT3ikkU7+o/lZZmm87n8a78IMlnbQco 90Dg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=+08LnZM6gTHJVnU8PijwxRuY4BymiddtQOqS27adRFs=; b=U+N27xm10PsUYq6NvZVXdfELSfD7/BlZMXR/ijmzAJAox0aAXyR9oM+Rlihx2ymB8u 0Fz3SCmBe52NwJ0B0640wxu2bgsonikSXDKzEFBFdxZ/+OkZ4035poRrij7YiXM4CrbO lqYNzR60+JIWcpCB0DHJS3XcmmDyi1lJylfbKpyU2zcSOjEuhA8I+9a1YufGv6UQZUvY g7tW20u0FmMW3LebsTn+l+969tvyA3+ExuVfqOEDZQxvVFbFx/wWpXsawxbGLTpYeIp5 XdAoUR937j7hedM6r5AUJrmS11WKQ6pcDHxAOqDDbwSxteMV0BFa9UaC5nj01qUzhN0F dmmA== X-Gm-Message-State: APjAAAVSIrUt9NAFo+qnYgi7S8wDi/Duy+OIzUZftIetrU7vg/fjgmqT 4yRh+Db9RU8LmQ8GdanZDkcSEB4rrV09IxoVxTe2y5rj X-Google-Smtp-Source: APXvYqy0fW4cwd/u9Pn91JLKm7W/Mbsy/jahp/g/rXLhbssakU3ksIgLXLoQ0TznyfZHVTNYUw6HyCoR7jzGxrIBjS0= X-Received: by 2002:a37:ac12:: with SMTP id e18mr13521544qkm.195.1552934556939; Mon, 18 Mar 2019 11:42:36 -0700 (PDT) MIME-Version: 1.0 References: <671218360.1775.1548958038839.JavaMail.Joan@BRAIN> <9A54B7B7-5C86-4EB8-9086-BE633A3B4491@apache.org> <5B417AD9-C867-42F3-85A3-0678116A128C@apache.org> In-Reply-To: From: Nick Vatamaniuc Date: Mon, 18 Mar 2019 14:42:25 -0400 Message-ID: Subject: Re: Shard Splitting API Proposal To: dev@couchdb.apache.org Content-Type: multipart/alternative; boundary="000000000000a03c5d058462c025" --000000000000a03c5d058462c025 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello everyone, Thank you all (Joan, Jan, Mike, Ilya, Adam) who contributed to the API discussion. There is now a PR open https://github.com/apache/couchdb/pull/1972 . If you get a chance, I would appreciate any reviews, feedback or comments. The PR message explains how the commits are organized and references the RFC. Basically it starts with preparatory work, ensuring all the existing components know how to deal with split shards. Then, some lower level bits are implemented, like bulk copy, internal replicator updates, etc., followed by the individual job implementation and the job manager which stitches everything together. In the end is the HTTP API implementation along with a suite of unit and Elixir integration tests. There is also a README_reshard.md file in src/mem3 that tries to provide a more in-depth technical description of how everything fits together. https://github.com/apache/couchdb/pull/1972/files#diff-5ac7b51ec4e03e068bf2= 71f34ecf88df (notice this URL might changer after a rebase). Also special thanks to Paul (job module implementation, get_ring function, a lot of architectural and implementation advice), Eric (finding many bugs, fixes for the bugs, and writing bulk copy and change feed tests), and Jay (testing and a thorough code review). Cheers, -Nick On Sun, Feb 17, 2019 at 2:32 AM Jan Lehnardt wrote: > Heya Nick, > > Nicely done. I think even though the majority of the discussion had > already happened here, the RFC nicely pulled together the various > discussion threads into a coherent whole. > > I would imagine the discussion on GH would be similarly fruitful. > > I gave it my +1, and as I said on the outset: I'm very excited about this > feature! > > Best > Jan > =E2=80=94 > > > On 15. Feb 2019, at 23:45, Nick Vatamaniuc wrote: > > > > Decided to kick the tires on the new RFC proposal issue type and create= d > > one for shard splitting: > > > > https://github.com/apache/couchdb/issues/1920 > > > > Let's see how it goes. Being it's the first one let me know if I missed > > anything obvious. > > > > Also I'd like to thank everyone who contributed to the discussion. The > API > > is looking more solid and is much improved from where it started. > > > > Cheers, > > -Nick > > > > > > > >> On Wed, Feb 13, 2019 at 12:03 PM Nick Vatamaniuc > wrote: > >> > >> > >> > >>> On Wed, Feb 13, 2019 at 11:52 AM Jan Lehnardt wrote: > >>> > >>> > >>> > >>>> On 13. Feb 2019, at 17:12, Nick Vatamaniuc > wrote: > >>>> > >>>> Hi Jan, > >>>> > >>>> Thanks for taking a look! > >>>> > >>>>> On Wed, Feb 13, 2019 at 6:28 AM Jan Lehnardt wrote= : > >>>>> > >>>>> Nick, this is great, I have a few tiny nits left, apologies I only > now > >>> got > >>>>> to it. > >>>>> > >>>>>> On 12. Feb 2019, at 18:08, Nick Vatamaniuc > >>> wrote: > >>>>>> > >>>>>> Shard Splitting API Proposal > >>>>>> > >>>>>> I'd like thank everyone who contributed to the API discussion. As = a > >>>>> result > >>>>>> we have a much better and consistent API that what we started with= . > >>>>>> > >>>>>> Before continuing I wanted to summarize to see what we ended up > with. > >>> The > >>>>>> main changes since the initial proposal were switching to using > >>> /_reshard > >>>>>> as the main endpoint and having a detailed state transition histor= y > >>> for > >>>>>> jobs. > >>>>>> > >>>>>> * GET /_reshard > >>>>>> > >>>>>> Top level summary. Besides the new _reshard endpoint, there `reaso= n` > >>> and > >>>>>> the stats are more detailed. > >>>>>> > >>>>>> Returns > >>>>>> > >>>>>> { > >>>>>> "completed": 3, > >>>>>> "failed": 4, > >>>>>> "running": 0, > >>>>>> "state": "stopped", > >>>>>> "state_reason": "Manual rebalancing", > >>>>>> "stopped": 0, > >>>>>> "total": 7 > >>>>>> } > >>>>>> > >>>>>> * PUT /_reshard/state > >>>>>> > >>>>>> Start or stop global rebalacing. > >>>>>> > >>>>>> Body > >>>>>> > >>>>>> { > >>>>>> "state": "stopped", > >>>>>> "reason": "Manual rebalancing" > >>>>>> } > >>>>>> > >>>>>> Returns > >>>>>> > >>>>>> { > >>>>>> "ok": true > >>>>>> } > >>>>>> > >>>>>> * GET /_reshard/state > >>>>>> > >>>>>> Return global resharding state and reason. > >>>>>> > >>>>>> { > >>>>>> "reason": "Manual rebalancing", > >>>>>> "state": =E2=80=9Cstopped=E2=80=9D > >>>>>> } > >>>>> > >>>>> More a note than a change request, but `state` is a very generic te= rm > >>> that > >>>>> often confuses folks when they are new to something. If the set of > >>> possible > >>>>> states is `started` and `stopped`, how about making this endpoint a > >>> boolean? > >>>>> > >>>>> /_reshard/enabled > >>>>> > >>>>> { > >>>>> "enabled": true|false, > >>>>> "reason": "Manual rebalancing" > >>>>> } > >>>>> > >>>>> > >>>> I thought of that as well. However _reshard/state seemed consistent > with > >>>> _reshard/jobs/$jobid/state. Setting "state":"stopped" _reshard/state > >>> will > >>>> lead to all individual running job state to become "stopped" as well= . > >>> And > >>>> "running" will make jobs that are not individually stopped also beco= me > >>>> "running". In other words since it directly toggle job's state (with= a > >>> job > >>>> being to override stopped state) I like that it had the same argumen= ts > >>> > >>> Got it, makes perfect sense. > >>> > >>>> and": true|false > >>>> > >>>> > >>>>> > >>>>>> * GET /_reshard/jobs > >>>>>> > >>>>>> Get the state of all the resharding jobs on the cluster. Now we > have a > >>>>>> detailed > >>>>>> state transition history which looks similar what _scheduler/jobs > >>> have. > >>>>>> > >>>>>> { > >>>>>> "jobs": [ > >>>>>> { > >>>>>> "history": [ > >>>>>> { > >>>>>> "detail": null, > >>>>>> "timestamp": "2019-02-06T22:28:06Z", > >>>>>> "type": "new" > >>>>>> }, > >>>>>> ... > >>>>>> { > >>>>>> "detail": null, > >>>>>> "timestamp": "2019-02-06T22:28:10Z", > >>>>>> "type": "completed" > >>>>>> } > >>>>>> ], > >>>>>> "id": > >>>>>> > >>> "001-0a308ef9f7bd24bd4887d6e619682a6d3bb3d0fd94625866c5216ec1167b4e23= ", > >>>>>> "job_state": "completed", > >>>>>> "node": "node1@127.0.0.1", > >>>>>> "source": "shards/00000000-ffffffff/db1.1549492084", > >>>>>> "split_state": "completed", > >>>>>> "start_time": "2019-02-06T22:28:06Z", > >>>>>> "state_info": {}, > >>>>>> "targets": [ > >>>>>> "shards/00000000-7fffffff/db1.1549492084", > >>>>>> "shards/80000000-ffffffff/db1.1549492084" > >>>>>> ], > >>>>> > >>>>> Since we went from /_split to /_reshard to prepare for merging > shards, > >>> we > >>>>> should reconsider source (singular) and targets (plural). Either a > >>> merge > >>>>> job (in the future) uses sources (plural) and target (singular) and > >>> the job > >>>>> schema is intentionally different, or we unify things to, maybe > >>> singular: > >>>>> source/target which would have the nice property of being analogous > to > >>> our > >>>>> replication job schema. The type definition then is source:String a= nd > >>>>> target:Array(2) for split jobs and source:Array(2) target:String fo= r > >>>>> (future) merge jobs. > >>>>> > >>>>> > >>>> Joan suggested adding a "type" field to both job creation POST body > and > >>>> also returning it when we inspect the job(s) state. So the > >>> "type":"split" > >>>> would toggle the schema. It could be "merge" in the future, or even > >>>> something like "rebalance" where it would merge some and split other= s > >>>> perhaps :-) and since we have a type it would be easier to > differentiate > >>>> between the merge and split jobs. But if there is a consensus from > >>> others > >>>> about switching targets to target that's easily as well. > >>> > >>> Ah, I=E2=80=99m less concerned here about not being able to tell whet= her it=E2=80=99s a > >>> split or a merge, and more about that having an indiscriminate plural > >>> form (sourceS/targetS) depending on the type. It=E2=80=99s just an ea= sy thing > to > >>> get wrong. > >>> > >>> In addition, we already have source/target in CouchDB replication, > >>> which people already use successfully, so making a similar thing that > >>> behaves slightly differently doesn=E2=80=99t sit quite right with me. > >>> > >>> I understand that I=E2=80=99m arguing to remove an =E2=80=99s=E2=80= =99 for very nitpicky > >>> but these are the kind of nitpick discussions we=E2=80=99ve done a lo= t in > >>> the early days which resulted in a by and large decent API that > >>> has served as well, and it=E2=80=99s something I=E2=80=99d like to se= e taken forward. > >>> Apologies if this all sounds very strict ;) > >>> > >>> > >> Thanks for the longer explanation. I understand now and agree, let's > make > >> it target. No worries about sounding nitpicky we should be nitpicky > about > >> APIs! > >> > >> > >>>> > >>>>> > >>>>> And just another question, sorry if I missed this elsewhere, would = we > >>> ever > >>>>> consider adding to split/merge ratio different from 1:2, say 1:4, o= r > >>> will > >>>>> folks have to run 1:2, 1:2, 1:2 to get to the same result? I=E2=80= =99m fine > >>> with > >>>>> either and if 1:2 fixed makes things simpler, I=E2=80=99m all for i= t ;) > >>>>> > >>>>> > >>>> Good point. Actually it's already implemented that way already :-) > Right > >>>> below the API surface it has a split=3D2 parameter and it just creat= es > the > >>>> targets based on that. It could be 2, 3, 4, ... 10 etc. However I wa= s > >>>> thinking of keeping it hard coded at 2 at first to keep the behavior > >>>> simpler at first and open that parameter to be user facing in a late= r > >>>> release based on user feedback. > >>> > >>> Ace, again, fully on board with shipping 1:2 first and maybe offering > >>> other > >>> options later. > >>> > >>> Best > >>> Jan > >>> =E2=80=94 > >>> > >>>> > >>>> Cheers, > >>>> > >>>> -Nick > >>> > >>> > > --000000000000a03c5d058462c025--