couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: On Plugins and Extensibility
Date Wed, 27 May 2015 17:31:57 GMT
On Mon, May 25, 2015 at 7:25 AM, Ilya Khlopotov <iilyak@ca.ibm.com> wrote:
>
> Paul,
>
> > From what I can tell your proposing that couch_epi is a central nexus of some rather
> > large swaths of code (ie, replacing the entire API to couch_stats).
> I am not proposing to replace couch_stats API. I just use couch_stats app as an example.
>
> > we're left with a whole bunch of questions that we'd have to answer.
> > For instance, given that you have mycustom_stats and couch_stats,
> > what happens if they both try and define the same service?
> This is explicitly allowed (see case 1.5.). Multiple providers of a service are needed
for things like:

I guess I misunderstand provider. Re-reading maybe its service I meant?

Basically, you have a bit of code to define the service for stats and
pass some arguments. Though from what I read in your example I'd
expect both couch_stats and mycustom_stats to both call that same
function as they both provide it. Which is odd enough on its own. And
then doubly odd if they happen to call it with different parameters.

> a) chain - https://github.com/gar1t/erlang-patterns/blob/master/patterns/chain.md
>    we would use this pattern in implementation of
>    - ddoc_validator
>    - dbname_validator
>    - docid_validator
>    - before_doc_update
>    - after_doc_read
> b) notify - notify all providers of a given service
>    we would use it for
>       - mrview_update_notify - notify when there is any change
>

In the simpler API I was trying to describe both of these would just
be helper functions in couch_epi that would expect a list of functions
as the data that would then be executed (with optional support for
dynamic module compilation to speed up dispatch).

> > Or use the same service and different parameters?
> You pass parameters to each provider independently either on:
> - 1.2 - register `provider`
> - 1.3 - subscribe for `service`
> Service itself just defines configuration buckets (list of configuration keys it `expects`
and `defines`).
>

The bit here about "expects" is exactly something I was trying to
avoid. This conversation feels a lot like the duck typing vs. static
typing debates. I was looking to remove a lot of extra code and just
rely on implicit contracts at the source level that would allow the
system to be a bit more open rather than having strict declarations
about what is allowed or expected or whatever.

> BR,
> ILYA
>
> Paul Davis ---2015/05/22 12:36:25 PM---Ilya, Can you go into the specific use cases a
bit more? From what I can
>
> From: Paul Davis <paul.joseph.davis@gmail.com>
> To: "dev@couchdb.apache.org" <dev@couchdb.apache.org>
> Date: 2015/05/22 12:36 PM
>
>
> Subject: Re: On Plugins and Extensibility
> ________________________________
>
>
>
> Ilya,
>
> Can you go into the specific use cases a bit more? From what I can
> tell your proposing that couch_epi is a central nexus of some rather
> large swaths of code (ie, replacing the entire API to couch_stats). To
> me that's a bit of an anti-goal. I'd much rather keep things simple
> and decoupled rather than try to engineer something that requires a
> lot of coordination between various pieces of the code base.
>
> For instance, in your given example, we have three major pieces that
> all have to agree on what is happening which may or may not be written
> when any of the others exist. Ie, we have to figure out what the
> service definition is, and then both the provider and subscriber have
> to agree to that definition. And if any of those three things change
> their idea of what the service is then we're left with a whole bunch
> of questions that we'd have to answer. For instance, given that you
> have mycustom_stats and couch_stats, what happens if they both try and
> define the same service? Or use the same service and different
> parameters? I think if we go this route then there's going to be a
> large amount of unintended complexity without a significant gain in
> functionality.
>
>
> On Fri, May 22, 2015 at 11:37 AM, Ilya Khlopotov <iilyak@ca.ibm.com> wrote:
> >
> > Hi Paul and Russell,
> >
> > Great writeup! Thank you for spending time writing it.
> >
> > My observation is that proposed API misses few quite important use cases. So I want
to iterate over the proposed API to clarify, improve and add missing functionality.
> > But first of all I would like to establish a glossary.
> >
> > # Glossary
> >
> > service - an abstract functionality defined by unique name and API
> > provider - a self contained implementation of `Service`'s API
> > subscriber - an application or a process which uses functionality provided by `Provider`
> >
> > # 1. Use cases
> >
> > 1.1. define `service` - register name and API
> > 1.2. register `provider` - app wants to register itself as a `provider` of a `service`.
> >   For example "I'm couch_stats and I provide `stats` service"
> > 1.3. subscribe for `service` - app wants to use one of the services (such as stats)
> > 1.4. dispatch the call to correct `provider`
> > 1.5. lookup all `providers` of a `service`
> > 1.6. lookup `service` data - which functions has to be defined by provider as well
as it's data keys
> > 1.7. lookup `provider` data
> >      for example the fact that given provider expects {priv_file, "couch_stats.cfg"}
> > 1.8. lookup `subscriber` data - for example list of defined metrics
> >
> > # Refined API by examples
> >
> > 1.1. define service
> >
> >     couch_epi:define_service(stats, [
> >        {methods, [
> >            {new, 2},
> >            {increment_counter, 1},
> >            {increment_counter, 2},
> >            {decrement_counter, 1},
> >            {decrement_counter, 2},
> >            {update_histogram, 2},
> >            {update_gauge, 2}]},
> >            {expects, [definition_files]},
> >            {defines, [definitions]}]).
> >
> > 1.2. register provider
> >
> >     {couch_epi, start_provider, [
> >               stats, %% Service
> >               couch_stats, %% ProviderName
> >               {module, couch_stats}, %% ProviderCBModule
> >               [{default_file, "couch_stats.cfg"}] %% ProviderExtraData
> >     ]}
> >
> > 1.3. subsribe for `service`
> >
> >     {couch_epi, start_subscriber, [
> >         appname, %% Subscriber
> >         stats, %% Service
> >         [{definition_files, ["http_stats.cfg", "couch_db_stats.cfg"]}]]},
> >
> > 1.4. dispatch the call to correct provider
> >
> >     couch_epi:invoke(
> >         stats, %% Service
> >         couch, %% SubscriberID (appname for now)
> >         increment_counter,
> >         [couchdb, document_inserts], %% Args
> >         [] %% Opts
> >     )
> >     %% fixtual example
> >     %% couch_stats:define_metrics(couch, ["http_stats.cfg", "couch_db_stats.cfg"])
> >     couch_epi:invoke(
> >         stats, %% Service
> >         couch, %% SubscriberID (appname for now)
> >         define_metrics,
> >         [], %% Args
> >         [{pass_subsriber_id, true}, {extra, definitions}] %% Opts - the most contaversive
part
> >     )
> >
> > 1.5. lookup all `providers` of a `service
> >
> >     1> couch_epi:get_providers(stats).
> >     [couch_stats, mycustom_stats]
> >
> > 1.6. lookup `service` data
> >
> >     2> couch_epi:get_service(stats).
> >     [
> >         {methods, [
> >             {new, 2},
> >             {increment_counter, 1},
> >             {increment_counter, 2},
> >             {decrement_counter, 1},
> >             {decrement_counter, 2},
> >             {update_histogram, 2},
> >             {update_gauge, 2}]},
> >         {expects, [definition_files]},
> >         {defines, [definitions]}
> >     ]
> >
> >
> > 1.7. lookup `provider` data
> >
> >     3> couch_epi:get_provider_data(stats, couch_stats).
> >     [
> >         stats,
> >         couch_stats,
> >         {module, couch_stats},
> >         [{default_file, "couch_stats.cfg"}]
> >     ]
> >
> > 1.8. lookup `subscriber` data
> >
> >     5> couch_epi:get_subscriber_data(stats, couch).
> >     [
> >         {defintions, [
> >             {[couchdb, auth_cache_hits], [
> >                 {type, counter},
> >                 {desc, <<"number of authentication cache hits">>}
> >             ]},
> >             {[couchdb, auth_cache_misses], [
> >                 {type, counter},
> >                 {desc, <<"number of authentication cache misses">>}
> >             ]}
> >         ]},
> >         {definition_files, ["http_stats.cfg", "couch_db_stats.cfg"]}
> >     ]
> >
> >     6> couch_epi:get_subscriber_data(stats, couch, definitions).
> >     [
> >         {[couchdb, auth_cache_hits], [
> >             {type, counter},
> >             {desc, <<"number of authentication cache hits">>}
> >         ]},
> >         {[couchdb, auth_cache_misses], [
> >             {type, counter},
> >             {desc, <<"number of authentication cache misses">>}
> >         ]}
> >     ]
> >
> > # Implementation notes
> >
> > - I really like Russel's idea of using module for dispatch. We would need to explore
it more.
> >
> > - To avoid global contention we could consider storing subscriber data in the named
process
> >   we start by calling couch_epi:start_subscriber from supervisor spec.
> >   couch_epi would deal with this logic.
> >   The user shouldn't rely on this implementation detail.
> >
> > - We want to avoid reconfiguration events (especially if we use compiled module
for dispatch) if subscriber or provider is restarted.
> >   So we would need a way to tell that there is a difference in the data provided
to define_service/start_provider/start_subscriber.
> >
> > BR,
> > ILYA
> >
> > Russell Branca ---2015/05/21 04:19:46 PM---Hey Paul, Thanks for the great writeup!
> >
> > From: Russell Branca <chewbranca@apache.org>
> > To: "dev@couchdb.apache.org" <dev@couchdb.apache.org>
> > Date: 2015/05/21 04:19 PM
> > Subject: Re: On Plugins and Extensibility
> >
> > ________________________________
> >
> >
> >
> > Hey Paul,
> >
> >
> > Thanks for the great writeup!
> >
> > Couple of questions:
> >
> > How do priv/*.cfg files work with dynamic config updates? Seems like we'll
> > lose the ability to write changes back to the config file, like we have
> > with default.ini. Although I've been wondering for a while if allowing
> > config:set/* to persist data is an anti-pattern compared to forcing
> > persistent updates to be made directly in config files with the hopes they
> > are done so in a VCS. Also, if we switch to priv/*.cfg how will users
> > update those files? seems like we'll have to poll the file system for every
> > one of those config files, or are you thinking that those files would only
> > be updated as part of releases?
> >
> > Have you thought at all about using the mochiglobal.erl approach to
> > regenerate a module with all the plugin dispatch compiled as functions?
> > Seems like using shared ETS tables could become a source of contention for
> > lots of concurrent requests. Proper benchmarks would provide better
> > understanding as to whether this is actually an issue.
> >
> >
> > -Russell
> >
> > On Thu, May 21, 2015 at 3:29 PM, Paul Davis <paul.joseph.davis@gmail.com>
> > wrote:
> >
> > > Hey everyone,
> > >
> > > So I've been meaning to write this email for sometime but have been
> > > kept busy with lots of super fun things that are super fun. Anyway, I
> > > just wanted to get this out there to start getting feed back from
> > > everyone involved.
> > >
> > > Also, while this is called "Plugin Proposal" it shouldn't be confused
> > > with the original couch_plugins use case. This is a lot lower level
> > > (and may be used by something like couch_plugins if we go there again
> > > in the future). Generally speaking this is just for cleaning up
> > > internals and for people that want to run either minimal installs of
> > > CouchDB or include the CouchDB in a larger Erlang application.
> > >
> > > Plugin Proposal
> > > ===============
> > >
> > > Background
> > > ----------
> > >
> > > As we've grown the code base to include more and more applications
> > > we're getting to the point where we've started adding various points
> > > of extension in various ways. The best existing example is the
> > > couch_stats application which loads stat/metric definitions from
> > > applications. Henning Diedrich has some unmerged worked which looks to
> > > follow a similar path for HTTP URL handlers. And Ilya Khlopotov has
> > > some work for providing vendor specific hooks.
> > >
> > > While each of these have some overlaps in their intended use case,
> > > they also share the fact that they've all implemented their own idea
> > > of extensibility in slightly different ways. That's not necessarily
> > > bad, but I think that we could reduce a lot of complexity if we take a
> > > step back and write a utility application that could then be used to
> > > support each of these features so that we can have both the
> > > extensibility as well as simplify the implementation of each
> > > individual feature.
> > >
> > > I'll start with a bit of background and then describe a general
> > > approach as well as show some hopefully explicit example snippets of
> > > how such a system might be used. Granted I haven't written out an
> > > entire implementation of this so I may be off the mark in some places.
> > >
> > > Bikeshed First
> > > --------------
> > >
> > > I have no idea what we'd call this. We could repurpose the
> > > couch_plugins app conceivably or make something new. For the the
> > > purposes of this document I'll call it couch_epi (for extensible
> > > plugin interface) and hopefully that's terrible enough someone will
> > > think of a better name for the actual application.
> > >
> > > Requirements
> > > ------------
> > >
> > > The three major requirements I've thought of are:
> > >
> > >   # Automatically discoverable
> > >   # Minimize apps that need to be started for tests
> > >   # Support release upgrades
> > >
> > > === Automatically Discoverable ===
> > >
> > > The biggest thing here is that I don't want to require a change to a
> > > default.ini or similar to enable or disable specific functionality
> > > when we can already signify that by having the application present or
> > > not. This is both for groups that may want to add new Erlang
> > > applications to a release as well as anyone that wants to run a
> > > minimal/embedded Couch. These are both obviously advanced uses but I
> > > think are important given the number of ways that CouchDB is being
> > > used.
> > >
> > > === Minimize the apps that need to be started for tests ===
> > >
> > > This one I think should be obvious to anyone that's been writing unit
> > > tests lately. There are some often silly places where we require
> > > applications be started just to run some tests. For example, places
> > > where we may want to call a function that's been instrumented and
> > > requires couch_stats to have knowledge about the stat.
> > >
> > > === Support release upgrades ===
> > >
> > > This one is obviously fairly advanced and limited in its audience but
> > > its something I'd like to at least consider in the design. This comes
> > > into effect for things like couch_stats that use a text file for its
> > > extension method. The issue is that the release upgrade mechanics
> > > don't provide any sort of signal that is easily usable to indicate
> > > when this file has changed during an upgrade so we're left polling the
> > > file system which is less than optimal.
> > >
> > > === Other Things to Consider ===
> > >
> > > A couple other things I'd like to keep in mind while discussing this
> > > is that I'd also like to minimize the amount of boilerplate code and
> > > coupling to make support this system. It should be hopefully a matter
> > > of a few lines of code to enable the extensibility on either side of
> > > the interface.
> > >
> > > General Design
> > > ==============
> > >
> > > The general themes that I see between all of our current extensions is
> > > that they're all basically just bags of random bits of data that each
> > > feature that is then used by each feature to define some behavior.
> > >
> > > For instance, couch_stats is just a list of tuples with some names,
> > > metric types, and descriptions. The dynamic chttpd handlers are just a
> > > list of URL endpoints to an MFA. And the vendor specific plugins are
> > > just a collection of functions that we'd like to invoke at specific
> > > points.
> > >
> > > Given that, the easiest approach I see is to implement a module that
> > > can be placed into the supervision tree that connects the data to a
> > > central repository (hosted by couch_epi) that can then be queried by
> > > each feature.
> > >
> > >
> > > Data Centric Examples
> > > ---------------------
> > >
> > > For a concrete example, lets consider couch_stats. Any application
> > > that wants to record metrics through the standard couch_stats app
> > > could add an entry in its supervision tree with something like:
> > >
> > >     {
> > >         appname_stats,
> > >         {couch_epi_data_source, start_link, [
> > >             appname,
> > >             {epi_key, {couch_stats, definitions}}
> > >             {priv_file, "couch_stats.cfg"}
> > >         ]},
> > >         permanent,
> > >         5000,
> > >         worker,
> > >         dynamic
> > >     }
> > >
> > > Then we'd just implement couch_epi_data_source once that would read
> > > data from the specified file from the application's priv directory and
> > > track it in an ets table.
> > >
> > > When couch_stats wants to learn about all the installed data for its
> > > stat definitions it would then just do something like:
> > >
> > >     couch_epi:get({couch_stats, definitions})
> > >
> > > Which would return a list of {appname, Data} tuples or something
> > > similar. To ensure that couch_stats can react to changes in these
> > > values, we would also provide an API like such:
> > >
> > >      couch_epi:listen({couch_stats, definitions})
> > >
> > > And any process that called that function would get a message whenever
> > > the data for that key changed which it could use for its own nefarious
> > > purposes.
> > >
> > > For upgrades, instead of specifying {priv_file, FileName} we could
> > > specify {mfa, {Mod, Fun, Args}} which would be invoked. Then we could
> > > add a code_change function to that module that would allow us to call
> > > something like couch_epi:reload() which would re-run the load for that
> > > process's data source.
> > >
> > > Function Centric Examples
> > > -------------------------
> > >
> > > Hopefully its obvious that given the data centric approach we could do
> > > something quite similar for functions (given that an MFA is just a
> > > small bit of data that we can use to invoke any function).
> > >
> > > Though ovbiously we'd like to be able to have a bit more of a useful
> > > API for clients so that we don't require all function based extensions
> > > to have to reimplement that function invocation code.
> > >
> > > The first thing that would change would be to provide a different type
> > > of supervision tree entry to indicate this. Off the top of my head
> > > this would look something like such:
> > >
> > >     {
> > >         appname_funcs,
> > >         {couch_epi_functions, start_link, [
> > >             appname,
> > >             {module, appname_funcs_mod}
> > >         ]},
> > >         permanent,
> > >         5000,
> > >         worker,
> > >         dynamic
> > >     }
> > >
> > > Then any function exported by appname_funcs_mod (that wasn't a builtin
> > > function, though maybe even if so?) could be invoked by an API like
> > > such:
> > >
> > >     couch_epi:invoke(my_function_name, Arg1, Arg2, Arg3).
> > >     couch_epi:apply(my_function_name, Args).
> > >
> > > We could also add various helper utilities or an Options parameter
> > > that would handle things like ignoring all exceptions, letting
> > > exceptions bubble or other such things that any invocation point might
> > > desire.
> > >
> > >
> > > More Details
> > > ------------
> > >
> > > The final app would have something like such:
> > >
> > >     couch_epi.app.src
> > >     couch_epi.erl - API for features accessing extension data
> > >     couch_epi_data_source.erl - Module that is inserted into
> > > application supevision trees to provide data sources for extension
> > > points
> > >     couch_epi_functions.erl - Module that is inserted into application
> > > supervision trees to provide function invocations
> > >     couch_epi_server.erl - Handles registration requests from
> > > couch_epi_data_soruce and couch_epi_functions and stores that
> > > information in an ets table. Also has the list of pids registered to
> > > listen for updates and notifies them.
> > >     couch_epi_sup.erl - probably a single child for couch_epi_server.erl
> > >     couch_epi_util.erl - The usual collection of functions that don't
> > > quite fit anywhere else (if needed).
> > >
> > > This should be a rather simple application in general. One side wants
> > > to publish some data, and the other wants to use it and possibly be
> > > notified when a particular bit of data is changed. And then we'll also
> > > provide some API sugar around invoking functions.
> > >
> > > Conclusion
> > > ==========
> > >
> > > Hopefully that all makes sense to at least some people in parts. I've
> > > been thinking about this on and off over a few weeks so my thoughts
> > > are a bit jumbled as I try and remember the salient points. I figured
> > > I'd just try and start getting them out there so that other people can
> > > comment on things and or let me know that I've forgotten something
> > > obvious that cripples this entire approach.
> > >
> >
> >
>
>
>

Mime
View raw message