couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ilya Khlopotov <iil...@ca.ibm.com>
Subject Re: On Plugins and Extensibility
Date Fri, 22 May 2015 16:37:39 GMT

Hi Paul and Russell,

Great writeup! Thank you for spending time writing it.

My observation is that proposed API misses few quite important use cases.
So I want to iterate over the proposed API to clarify, improve and add
missing functionality.
But first of all I would like to establish a glossary.

# Glossary

service - an abstract functionality defined by unique name and API
provider - a self contained implementation of `Service`'s API
subscriber - an application or a process which uses functionality provided
by `Provider`

# 1. Use cases

1.1. define `service` - register name and API
1.2. register `provider` - app wants to register itself as a `provider` of
a `service`.
  For example "I'm couch_stats and I provide `stats` service"
1.3. subscribe for `service` - app wants to use one of the services (such
as stats)
1.4. dispatch the call to correct `provider`
1.5. lookup all `providers` of a `service`
1.6. lookup `service` data - which functions has to be defined by provider
as well as it's data keys
1.7. lookup `provider` data
     for example the fact that given provider expects {priv_file,
"couch_stats.cfg"}
1.8. lookup `subscriber` data - for example list of defined metrics

# Refined API by examples

1.1. define service

    couch_epi:define_service(stats, [
       {methods, [
           {new, 2},
           {increment_counter, 1},
           {increment_counter, 2},
           {decrement_counter, 1},
           {decrement_counter, 2},
           {update_histogram, 2},
           {update_gauge, 2}]},
           {expects, [definition_files]},
           {defines, [definitions]}]).

1.2. register provider

    {couch_epi, start_provider, [
              stats, %% Service
              couch_stats, %% ProviderName
              {module, couch_stats}, %% ProviderCBModule
              [{default_file, "couch_stats.cfg"}] %% ProviderExtraData
    ]}

1.3. subsribe for `service`

    {couch_epi, start_subscriber, [
        appname, %% Subscriber
        stats, %% Service
        [{definition_files, ["http_stats.cfg", "couch_db_stats.cfg"]}]]},

1.4. dispatch the call to correct provider

    couch_epi:invoke(
        stats, %% Service
        couch, %% SubscriberID (appname for now)
        increment_counter,
        [couchdb, document_inserts], %% Args
        [] %% Opts
    )
    %% fixtual example
    %% couch_stats:define_metrics(couch, ["http_stats.cfg",
"couch_db_stats.cfg"])
    couch_epi:invoke(
        stats, %% Service
        couch, %% SubscriberID (appname for now)
        define_metrics,
        [], %% Args
        [{pass_subsriber_id, true}, {extra, definitions}] %% Opts - the
most contaversive part
    )

1.5. lookup all `providers` of a `service

    1> couch_epi:get_providers(stats).
    [couch_stats, mycustom_stats]

1.6. lookup `service` data

    2> couch_epi:get_service(stats).
    [
        {methods, [
            {new, 2},
            {increment_counter, 1},
            {increment_counter, 2},
            {decrement_counter, 1},
            {decrement_counter, 2},
            {update_histogram, 2},
            {update_gauge, 2}]},
        {expects, [definition_files]},
        {defines, [definitions]}
    ]


1.7. lookup `provider` data

    3> couch_epi:get_provider_data(stats, couch_stats).
    [
        stats,
        couch_stats,
        {module, couch_stats},
        [{default_file, "couch_stats.cfg"}]
    ]

1.8. lookup `subscriber` data

    5> couch_epi:get_subscriber_data(stats, couch).
    [
        {defintions, [
            {[couchdb, auth_cache_hits], [
                {type, counter},
                {desc, <<"number of authentication cache hits">>}
            ]},
            {[couchdb, auth_cache_misses], [
                {type, counter},
                {desc, <<"number of authentication cache misses">>}
            ]}
        ]},
        {definition_files, ["http_stats.cfg", "couch_db_stats.cfg"]}
    ]

    6> couch_epi:get_subscriber_data(stats, couch, definitions).
    [
        {[couchdb, auth_cache_hits], [
            {type, counter},
            {desc, <<"number of authentication cache hits">>}
        ]},
        {[couchdb, auth_cache_misses], [
            {type, counter},
            {desc, <<"number of authentication cache misses">>}
        ]}
    ]

# Implementation notes

- I really like Russel's idea of using module for dispatch. We would need
to explore it more.

- To avoid global contention we could consider storing subscriber data in
the named process
  we start by calling couch_epi:start_subscriber from supervisor spec.
  couch_epi would deal with this logic.
  The user shouldn't rely on this implementation detail.

- We want to avoid reconfiguration events (especially if we use compiled
module for dispatch) if subscriber or provider is restarted.
  So we would need a way to tell that there is a difference in the data
provided to define_service/start_provider/start_subscriber.

BR,
ILYA



From:	Russell Branca <chewbranca@apache.org>
To:	"dev@couchdb.apache.org" <dev@couchdb.apache.org>
Date:	2015/05/21 04:19 PM
Subject:	Re: On Plugins and Extensibility



Hey Paul,


Thanks for the great writeup!

Couple of questions:

How do priv/*.cfg files work with dynamic config updates? Seems like we'll
lose the ability to write changes back to the config file, like we have
with default.ini. Although I've been wondering for a while if allowing
config:set/* to persist data is an anti-pattern compared to forcing
persistent updates to be made directly in config files with the hopes they
are done so in a VCS. Also, if we switch to priv/*.cfg how will users
update those files? seems like we'll have to poll the file system for every
one of those config files, or are you thinking that those files would only
be updated as part of releases?

Have you thought at all about using the mochiglobal.erl approach to
regenerate a module with all the plugin dispatch compiled as functions?
Seems like using shared ETS tables could become a source of contention for
lots of concurrent requests. Proper benchmarks would provide better
understanding as to whether this is actually an issue.


-Russell

On Thu, May 21, 2015 at 3:29 PM, Paul Davis <paul.joseph.davis@gmail.com>
wrote:

> Hey everyone,
>
> So I've been meaning to write this email for sometime but have been
> kept busy with lots of super fun things that are super fun. Anyway, I
> just wanted to get this out there to start getting feed back from
> everyone involved.
>
> Also, while this is called "Plugin Proposal" it shouldn't be confused
> with the original couch_plugins use case. This is a lot lower level
> (and may be used by something like couch_plugins if we go there again
> in the future). Generally speaking this is just for cleaning up
> internals and for people that want to run either minimal installs of
> CouchDB or include the CouchDB in a larger Erlang application.
>
> Plugin Proposal
> ===============
>
> Background
> ----------
>
> As we've grown the code base to include more and more applications
> we're getting to the point where we've started adding various points
> of extension in various ways. The best existing example is the
> couch_stats application which loads stat/metric definitions from
> applications. Henning Diedrich has some unmerged worked which looks to
> follow a similar path for HTTP URL handlers. And Ilya Khlopotov has
> some work for providing vendor specific hooks.
>
> While each of these have some overlaps in their intended use case,
> they also share the fact that they've all implemented their own idea
> of extensibility in slightly different ways. That's not necessarily
> bad, but I think that we could reduce a lot of complexity if we take a
> step back and write a utility application that could then be used to
> support each of these features so that we can have both the
> extensibility as well as simplify the implementation of each
> individual feature.
>
> I'll start with a bit of background and then describe a general
> approach as well as show some hopefully explicit example snippets of
> how such a system might be used. Granted I haven't written out an
> entire implementation of this so I may be off the mark in some places.
>
> Bikeshed First
> --------------
>
> I have no idea what we'd call this. We could repurpose the
> couch_plugins app conceivably or make something new. For the the
> purposes of this document I'll call it couch_epi (for extensible
> plugin interface) and hopefully that's terrible enough someone will
> think of a better name for the actual application.
>
> Requirements
> ------------
>
> The three major requirements I've thought of are:
>
>   # Automatically discoverable
>   # Minimize apps that need to be started for tests
>   # Support release upgrades
>
> === Automatically Discoverable ===
>
> The biggest thing here is that I don't want to require a change to a
> default.ini or similar to enable or disable specific functionality
> when we can already signify that by having the application present or
> not. This is both for groups that may want to add new Erlang
> applications to a release as well as anyone that wants to run a
> minimal/embedded Couch. These are both obviously advanced uses but I
> think are important given the number of ways that CouchDB is being
> used.
>
> === Minimize the apps that need to be started for tests ===
>
> This one I think should be obvious to anyone that's been writing unit
> tests lately. There are some often silly places where we require
> applications be started just to run some tests. For example, places
> where we may want to call a function that's been instrumented and
> requires couch_stats to have knowledge about the stat.
>
> === Support release upgrades ===
>
> This one is obviously fairly advanced and limited in its audience but
> its something I'd like to at least consider in the design. This comes
> into effect for things like couch_stats that use a text file for its
> extension method. The issue is that the release upgrade mechanics
> don't provide any sort of signal that is easily usable to indicate
> when this file has changed during an upgrade so we're left polling the
> file system which is less than optimal.
>
> === Other Things to Consider ===
>
> A couple other things I'd like to keep in mind while discussing this
> is that I'd also like to minimize the amount of boilerplate code and
> coupling to make support this system. It should be hopefully a matter
> of a few lines of code to enable the extensibility on either side of
> the interface.
>
> General Design
> ==============
>
> The general themes that I see between all of our current extensions is
> that they're all basically just bags of random bits of data that each
> feature that is then used by each feature to define some behavior.
>
> For instance, couch_stats is just a list of tuples with some names,
> metric types, and descriptions. The dynamic chttpd handlers are just a
> list of URL endpoints to an MFA. And the vendor specific plugins are
> just a collection of functions that we'd like to invoke at specific
> points.
>
> Given that, the easiest approach I see is to implement a module that
> can be placed into the supervision tree that connects the data to a
> central repository (hosted by couch_epi) that can then be queried by
> each feature.
>
>
> Data Centric Examples
> ---------------------
>
> For a concrete example, lets consider couch_stats. Any application
> that wants to record metrics through the standard couch_stats app
> could add an entry in its supervision tree with something like:
>
>     {
>         appname_stats,
>         {couch_epi_data_source, start_link, [
>             appname,
>             {epi_key, {couch_stats, definitions}}
>             {priv_file, "couch_stats.cfg"}
>         ]},
>         permanent,
>         5000,
>         worker,
>         dynamic
>     }
>
> Then we'd just implement couch_epi_data_source once that would read
> data from the specified file from the application's priv directory and
> track it in an ets table.
>
> When couch_stats wants to learn about all the installed data for its
> stat definitions it would then just do something like:
>
>     couch_epi:get({couch_stats, definitions})
>
> Which would return a list of {appname, Data} tuples or something
> similar. To ensure that couch_stats can react to changes in these
> values, we would also provide an API like such:
>
>      couch_epi:listen({couch_stats, definitions})
>
> And any process that called that function would get a message whenever
> the data for that key changed which it could use for its own nefarious
> purposes.
>
> For upgrades, instead of specifying {priv_file, FileName} we could
> specify {mfa, {Mod, Fun, Args}} which would be invoked. Then we could
> add a code_change function to that module that would allow us to call
> something like couch_epi:reload() which would re-run the load for that
> process's data source.
>
> Function Centric Examples
> -------------------------
>
> Hopefully its obvious that given the data centric approach we could do
> something quite similar for functions (given that an MFA is just a
> small bit of data that we can use to invoke any function).
>
> Though ovbiously we'd like to be able to have a bit more of a useful
> API for clients so that we don't require all function based extensions
> to have to reimplement that function invocation code.
>
> The first thing that would change would be to provide a different type
> of supervision tree entry to indicate this. Off the top of my head
> this would look something like such:
>
>     {
>         appname_funcs,
>         {couch_epi_functions, start_link, [
>             appname,
>             {module, appname_funcs_mod}
>         ]},
>         permanent,
>         5000,
>         worker,
>         dynamic
>     }
>
> Then any function exported by appname_funcs_mod (that wasn't a builtin
> function, though maybe even if so?) could be invoked by an API like
> such:
>
>     couch_epi:invoke(my_function_name, Arg1, Arg2, Arg3).
>     couch_epi:apply(my_function_name, Args).
>
> We could also add various helper utilities or an Options parameter
> that would handle things like ignoring all exceptions, letting
> exceptions bubble or other such things that any invocation point might
> desire.
>
>
> More Details
> ------------
>
> The final app would have something like such:
>
>     couch_epi.app.src
>     couch_epi.erl - API for features accessing extension data
>     couch_epi_data_source.erl - Module that is inserted into
> application supevision trees to provide data sources for extension
> points
>     couch_epi_functions.erl - Module that is inserted into application
> supervision trees to provide function invocations
>     couch_epi_server.erl - Handles registration requests from
> couch_epi_data_soruce and couch_epi_functions and stores that
> information in an ets table. Also has the list of pids registered to
> listen for updates and notifies them.
>     couch_epi_sup.erl - probably a single child for couch_epi_server.erl
>     couch_epi_util.erl - The usual collection of functions that don't
> quite fit anywhere else (if needed).
>
> This should be a rather simple application in general. One side wants
> to publish some data, and the other wants to use it and possibly be
> notified when a particular bit of data is changed. And then we'll also
> provide some API sugar around invoking functions.
>
> Conclusion
> ==========
>
> Hopefully that all makes sense to at least some people in parts. I've
> been thinking about this on and off over a few weeks so my thoughts
> are a bit jumbled as I try and remember the salient points. I figured
> I'd just try and start getting them out there so that other people can
> comment on things and or let me know that I've forgotten something
> obvious that cripples this entire approach.
>


Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message