couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Glynn Bird <glynn.b...@gmail.com>
Subject Re: Partition query endpoints in CouchDB 4.0
Date Mon, 11 May 2020 13:10:11 GMT
I've worked with folks using partitioned database so I thought I'd drop my
experience of that here:

- partitioned databases can definitely give a performance boost (in CouchDB
< 4 scenarios) to use-cases where the main "read" use-case can be directed
to a single partition. In such cases, only a fraction of the shards are
exercised in answering the query - so there are scalability benefits there.
- not everyone who wanted to migrate from non-partitioned --> partitioned
did end up doing so - migrating involves mutating the document _id and
replication can't help - plus having to rethink indexing, access patterns
is too much for some etc. It seemed much better suited to "green field"
projects.
- in some cases partitioned databases made performance worse - by directing
a large proportion of traffic to one or a handful of partitions. This may
not be obvious at the design stage, you only find out when real-world
traffic arrives!
- it would have been nice to have a "per partition changes feed" - which
would allow a "one partition per user" model, with all the data in the same
database for reporting purposes.



On Mon, 11 May 2020 at 12:35, Garren Smith <garren@apache.org> wrote:

> Coming back to this. I still think we should support it fully in 4.x so
> that anyone using it in 3.x will not experience any api changes when moving
> to 4.x. Once we have had more people use it in 3.x we can make a call on
> deprecating it for 5.x or look at adding more features to it.
>
> On Tue, Apr 21, 2020 at 11:01 PM Robert Samuel Newson <rnewson@apache.org>
> wrote:
>
> > On Adam's point that the partitioned query api encourages good choices
> > ("discourages hot spots"), that's only true for folks that read the
> > documentation, which in my experience is a low percentage of folks. I've
> > encountered a heavy user of partitioned dbs that had precisely four
> > partitions in mind, for millions of docs (They chose "doc_type" as their
> > partition value).
> >
> > My view for 4.0 is;
> >
> > 1) ignore the partitioned flag when creating databases
> >
>  I don't think we should ignore it.
>
> 2) the "partitioned" property no longer reported in GET /dbname
> >
>
> I would prefer we report the partitioned flag.  It seems confusing to not
> report a setting a user intentionally set.
>
> 3) the various _partition endpoints still work
> > 4) all views work either "global" or "partitioned" depending on the
> > endpoint used.
> >
> > for 5.0 I'm +0 on removing the _partition endpoints, but we can take that
> > vote at the time based on contemporary feedback.
> >
> > B.
> >
> > > On 21 Apr 2020, at 21:35, Robert Samuel Newson <rnewson@apache.org>
> > wrote:
> > >
> > > Hi,
> > >
> > > Good points on both sides of this. One thing we can hopefully get
> > agreement on is the ?partitioned=true flag on creation and, deeper, the
> > lack of distinction between the two "types" of database going forward?
> > >
> > > B.
> > >
> > >> On 21 Apr 2020, at 18:51, Garren Smith <garren@apache.org> wrote:
> > >>
> > >> I'm on the fence when it comes to removing it. In terms of the
> original
> > >> plan of making querying faster by querying fewer shards that obviously
> > >> isn't needed. But I think it does create a nice mental model/design
> > pattern
> > >> when building an application in CouchDB.  Splitting your data into
> > >> partitions that contain similar documents makes sense. And once we on
> > FDB
> > >> it would be awesome to see if we could have a changes feed per
> > partition.
> > >> That would be a really nice feature.
> > >>
> > >> Cheers
> > >> Garren
> > >>
> > >> On Tue, Apr 21, 2020 at 5:51 PM Adam Kocoloski <kocolosk@apache.org>
> > wrote:
> > >>
> > >>> I think it’s difficult to make a call when 3.0 is still so new.
> > >>>
> > >>> The case for deprecation here is basically less code to maintain,
> > right?
> > >>> It’s not like a user of partitioned databases is causing pain for
an
> > >>> FDB-based CouchDB; if anything, there’s a second-order benefit
> because
> > the
> > >>> partitioning discourages hot spots from forming in the
> > (range-partitioned)
> > >>> FDB keyspace.
> > >>>
> > >>> Cheers, Adam
> > >>>
> > >>>> On Apr 20, 2020, at 11:51 PM, Kyle Snavely <kjsnavely@gmail.com>
> > wrote:
> > >>>>
> > >>>> My two cents is the same. Let's allow 3.* users migrate to 4.*
> without
> > >>>> needing to e.g. change the PQ part of their application and remove
> > the PQ
> > >>>> endpoints in 5.0.
> > >>>>
> > >>>> Best,
> > >>>> Kyle
> > >>>>
> > >>>> On Mon, Apr 20, 2020, 4:16 PM Ilya Khlopotov <iilyak@apache.org>
> > wrote:
> > >>>>
> > >>>>> Given that it unlikely that there are too many people using
it and
> > it is
> > >>>>> being noop in FDB world. I think we should deprecate and remove
> > >>> _partition
> > >>>>> endpoint.
> > >>>>>
> > >>>>> On 2020/04/20 21:04:58, Robert Samuel Newson <rnewson@apache.org>
> > >>> wrote:
> > >>>>>> Hi All,
> > >>>>>>
> > >>>>>> I'd like to get views on whether we should preserve the
_partition
> > >>>>> endpoints in CouchDB 4.0 or remove them. In CouchDB 4.0 all
_view
> and
> > >>> _find
> > >>>>> queries will automatically benefit from the same performance
boost
> > that
> > >>> the
> > >>>>> "partitioned database" feature brings, by virtue of FoundationDB.
> > >>>>>>
> > >>>>>> If we're preserving it, are we also deprecating it (so
it's gone
> in
> > >>> 5.0)?
> > >>>>>>
> > >>>>>> If we're ditching it, what will the endpoint return instead
(404
> Not
> > >>>>> Found, 410 Gone?)
> > >>>>>>
> > >>>>>> Thoughts welcome.
> > >>>>>>
> > >>>>>> B.
> > >>>>>
> > >>>
> > >>>
> > >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message