streams-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Blackmon <>
Subject Re: [DISCUSS] Namespacing of configuration inputs to components
Date Tue, 06 Feb 2018 18:31:43 GMT
I realize it would useful to pick up configuration details from all
ancestor classes as well, so I added that as well.

So the set of sources from which a configuration value can be sourced are
(in preference order)

1) full canonical name of class
i.e. org.apache.streams.twitter.config.TwitterEngagersProviderConfiguration

2) simple name of class i.e. just TwitterEngagersProviderConfiguration

3) any ancestor Class, closest ancestors preferred,
i.e.  org.apache.streams.twitter.config.TwitterTimelineProviderConfiguration,
then  org.apache.streams.twitter.config.TwitterUserInformationConfiguration,

(java.lang.Object is immune from this treatment)

4) the package of the Class, i.e. org.apache.streams.twitter.config

5) all parent packages of the Class, closest packages preferred, i.e.
org.apache.streams.twitter, then
org.apache.streams, then
org.apache, then

I look forward to getting some eyes, comments, and +1’s on this work, as it
will unlock some in-progress features and make configuration of code build
with streams far more flexible going forward!


On Feb 1, 2018 at 12:06 PM, Steve Blackmon <> wrote:

I think I’ve come up with a very nice solution to this problem.

- Add a new method to ComponentConfigurator - detectConfiguration()
- When the caller does not provide a Config or a path to
detectConfiguration, get fancy:
- Search for each of the fields declared by the component POJO class on
each of the following:
- the SimpleClassName
- the CanonicalClassName
- each ancestor package of the CanonicalClassName, longest to shortest
- if a field is specified at more than one package/class level, the class
or longest package ancestor takes precedence.

I created STREAMS-580 and submitted a PR:

The unit tests appear to be working as described above - please check it
out and let me know if anything looks off.

P.S. Note that after merging this capability, we’d still need to migrate
each component that currently hard-codes its configuration path to adopt
this method instead.

On Jan 31, 2018 at 7:47 PM, Steve Blackmon <> wrote:

TL;DR I think we should start aligning the default JVM config path with the
package/class of the code being configured

Hello All,

I’ve been working on two providers (TwitterEngagersProvider and
InstagramEngagersProvider) which share significant code with other

This is sort of complex but here’s the nature of the implementation:
a) Instantiate and run provider A, which generates user objects or user ids
b) Instantiate and run provider B, using the output of provider A as the
c) All of this happens in provider C, which then transforms the output of
provider B, exploding each item into a list as it’s own output.

Due mostly to conventions that have been in the project for a while,
providers A, B, and C all expect to find their configuration at the same
place in the JVM properties - either ‘twitter’ or ‘instagram’.
Additionally, providers A, B, and C all expect to be configured with a
max_items parameter.

So you can probably see the challenge - configuring provider C with
max_items = 1000 winds up giving unhelpful direction to providers A and B
about how much work they should do.  We don’t need or want 1000 user ids,
we really want 1000 engager ids which can typically be derived from ~50
user ids.

While on the surface its nice to be able to reuse a single conf file to run
a variety of providers, I see now that creates problems when running code
that integrates a variety of components that expect to be configured from
overlapping paths.  It’s possible to get around these problems by writing
glue code, but it seems like it would be smarter to have all classes by
default source their configuration using a simple and obvious namespace
strategy based on package/class.

So instead of every twitter provider picking up:
twitter.max_items = 1000

They would all be configured separately like:
org.apache.streams.twitter.providers.ProviderA.max_items = 10
org.apache.streams.twitter.providers.ProviderB.max_items = 100
org.apache.streams.twitter.providers.ProviderC.max_items = 1000

Since there are other configuration properties that are required but do not
differ between the three providers, a more typical configuration file would
really look like:
org.apache.streams.twitter.Twitter = { oauth = { … } }
org.apache.streams.twitter.providers.ProviderA =
org.apache.streams.twitter.providers.ProviderA.max_items = 10
org.apache.streams.twitter.providers.ProviderB =
org.apache.streams.twitter.providers.ProviderB.max_items = 100
org.apache.streams.twitter.providers.ProviderC =
org.apache.streams.twitter.providers.ProviderC.max_items = 1000

This would be a breaking refactoring, targeted for the 0.6.0 release.

I’m interested in others feedback on the idea in general, and
specifically.  For example whether fully qualified class name or just
simple name is preferable as a prefix.  And whether the class name used
should be that of the configuration bean i.e. TwitterConfiguration or of
the class that operates on it i.e. Twitter

Thanks for reading,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message