lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-9959) SolrInfoMBean-s category and hierarchy cleanup
Date Tue, 28 Mar 2017 01:57:41 GMT

    [ https://issues.apache.org/jira/browse/SOLR-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15944393#comment-15944393
] 

Hoss Man commented on SOLR-9959:
--------------------------------



I don't understand this comment...

bq. ... This makes it impossible to ask the API for all metrics for a component, which is
what the /admin/mbeans used to provide. On the other hand, the /admin/metrics API provides
a much more structured view of related metrics even if they come from different components.

Why is this impossible?

I see that on the jira/solr-9959 branch, using {{/admin/mbeans?stats=true}} (and the Plugin/Stats
screen in the UI) is useless (the stats param is now completely ignored) but meanwhile if
I go to {{/admin/metrics}} I can "ask" for all of the stats for all components (or filter
the list a variety of ways) ... so why can't {{/admin/mbeans}} use the same code {{/admin/metrics}}
to loop/filter over the reported metrics for each "mbean" ?

Specifically...

* Since I (as a user) can do these:
** http://localhost:8983/solr/admin/metrics?group=solr.core.techproducts&prefix=CACHE
** http://localhost:8983/solr/admin/metrics?group=solr.core.techproducts&prefix=CACHE.searcher.queryResultCache
* Why can't I get the same info from these:
** http://localhost:8983/solr/techproducts/admin/mbeans?stats=true&cat=CACHE
** http://localhost:8983/solr/techproducts/admin/mbeans?stats=true&cat=CACHE&key=queryResultCache

?

I don't mind breaking back compat (particularly for internal java APIs) when needed to make
overall improvements but in this case it seems like we're breaking HTTP request/param level
APIs (and the Admin UI) unneccessarily if the underlying info is still accessible from {{/admin/metrics}}.

At worst it seems like maybe we could add a new {{SolrMetricManager.SuffixFilter}} to let
{{/admin/mbeans}} "search" for metrics assocaited with the {{cat}} and {{key}} combo it's
currently dealing with as it loops over items?

But AFAICT an even cleaner/simpler solution would be:
* add a {{default MetricsMap getMetricsMap() {return null;} }} to {{SolrInfoBean}} (to replace
{{getStatistics()}})
* any class implementing both {{SolrInfoBean}} and {{SolrMetricProducer}} _could_ implement
{{initializeMetrics(...)}} such that it keeps a private reference to a {{MetricsMap}} it registers
& return that from {{getMetricsMap()}}
** many of the {{SolrInfoBean}} classes already seem to be maintaining a {{private MetricsMap
metricsMap;}} that is assigned in {{initializeMetrics(...)}} but never used (in the class)
after that
* {{/admin/mbeans}} can call {{getMetricsMap()}} on each {{SolrInfoBean}} it loops over if
{{stats=true}}

(I realize even if we do this a lot of things available from {{/admin/metrics}} still wouldn't
be available from {{/admin/mbeans?stats=true}} -- but that's totally reasonable.  More significantly,
this would -- IIUC -- let us ensure everything *currently* available from {{/admin/mbeans?stats=true}}
is *still* available moving forward, minimizing breakage for existing consumers of {{/admin/mbeans}}.
 If/when they want more advanced stuff, they can switch to {{/admin/metrics}})

----

Other misc quesions/comments about the branch in no particular order...

* in general it would be helpful if this branch/jira included a text file listing all the
major release-note/ref-guide updates needed once this lands so people can fully grasp whats
changing from a users perspective: {{/admin/mbeans}} vs {{/admin/metrics}}, enabling JMX,
JMX config options no longer supported (see below), etc...

* {{/admin/metrics}} 
** Why doesn't {{/admin/metrics}} expose an param for the {{SolrMetricManager.RegexFilter}}
?
*** Maybe it does and it's just not obvious?
** (When) Are we going to expose {{/admin/metrics}} via the Admin UI?
*** Even if we "fix" the problems w/ {{/admin/mbeans?stats=true}} mentioned above (so that
the Plugins/Stats screen starts working) having some sort of UI screen exposting *all* the
metrics now supported seems like a good idea.
*** If we can't fix the {{/admin/mbeans?stats=true}} the way i suggested above, then adding
a roughly equivilent UI screens using {{/admin/metrics}} seems like it should be a blocker
for landing this branch.

* This branch modifies the {{ivy.xml}} and IDE config files for dataimporthandler to have
a compile dependency on {{io.dropwizard.metrics/metrics-core}} -- but that doesn't actaully
seem to be neccessary to compile/test DIH
** temp/abandondoned API refactorying that needs reverted?

* {{LFUCache}} and {{FastLRUCache}} still have comments that refer to the (removed) {{getStatistics()}}
method.

* I see this comment about {{fieldCache}} registration in SolrCore...{code}
SolrFieldCacheBean solrFieldCacheBean = new SolrFieldCacheBean();
// XXX this should be registered at the CONTAINER level because it's not core-specific!
solrFieldCacheBean.initializeMetrics(metricManager, coreMetricManager.getRegistryName(), null);
infoRegistry.put("fieldCache", solrFieldCacheBean);
{code}
** Since {{UninvertingReader}} is now a solr level class, and we're making non-compat changes
for 7.0 anyway, why don't we:
*** Register a "global" {{new SolrFieldCacheBean()}} at the container level as the comment
suggests we should.
*** Add a new verion of {{UninvertinterReader.getUninvertedStats(IndexReader)}} that filters
the {{CacheEntry[]}} based on the {{IndexReader.CacheKey}} of the reader passed in ... _and
any of it's leaf readers!_
**** Or ... I suppose technically we should recursively use {{reader.getContext().getChildren()}}
because there might be intermediate wrappers? maybe?
*** use the new {{UninvertinterReader.getUninvertedStats(IndexReader)}} in a {{new SolrFieldCacheBean(searcher.getIndexReader())}}
registered at the core/searcher level via {{CACHE.searcher.fieldCache}} (like all the other
searcher/reader related caches)
** Even if we want to punt the idea for a new {{UninvertinterReader.getUninvertedStats(IndexReader)}}
method to a new jira and wory about it later, I don't see any reason why _this_ jira/branch
shouldn't go ahead and move the {{fieldCache}} metrics to be a the container level (like the
comment suggests)
*** for backcompat, we can still (for now) put a {{new SolrFieldCacheBean()}} in the {{infoRegistry}}
of every core ... assuming we _also_ fix {{/admin/mbeans?stats=true}} as i suggested above,
then those {{fieldCache}} "stats" should still be available as they always have been {{/admin/mbean}}
users.

* MetricType
** this error (String) shouldn't be hardcoded, we should build it up from {{EnumSet.allOf(MetricType.class)}}...{code}
    } catch (IllegalArgumentException e) {
      throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Invalid metric type in:
" + types + " specified. Must be one of (all, meter, timer, histogram, counter, gauge)", e);
    }
{code}
** We could have a {{public static final String SUPPORTED_TYPES_MSG = ...}} in {{MetricType}}
to create this String-ified list only once on class load.

* JMX, in general, doesn't seem to be working as documented (in the ref guide) for the examples
on this branch
** ie: {{bin/solr -e techproducts -Dcom.sun.management.jmxremote}} + jconsole shows no (solr)
level metrics/stats.
** It appears this is because (on this branch) solr metrics are only exposed via JMX if there
is a {{SolrJmxReporter}} configured in {{solr.xml}} ?
*** if so this is a pretty big change, and definitely needs some docs/release noting it.
*** It also raises some big questions:
**** Should {{solr/server/solr/solr.xml}} be updated to include {{SolrJmxReporter}} so it
works with the examples ?
**** At a minimum should it be updated to conditionally set a reporter based on sysprops by
{{bin/solr}} based on {{ENABLE_REMOTE_JMX_OPTS}} ?
** Personally, I think we should consider implicitly registering a {{SolrJmxReporter}} based
on the same criteria that are currently checked on master when an empty {{<jmx/>}} is
found: {{null!=JmxUtil.findFirstMBeanServer()}}
*** that way if users configure a JMX MBeanServer via {{bin/solr}} or {{solr.in.sh}} options,
they'll automatically get Solr Metrics exposed via JMX
*** the only reason the _*very*_ legacy {{JmxMonitoredMap}} based code was written to require
{{<jmx/>}} before doing any of this was in case a user was deplying Solr to an appserver
along with many other apps, and you *only* wanted JMX stats from your appserver or other apps
-- but not Solr.
**** That ship has sailed and I don't think we need to worry about it.
**** if people start {{bin/solr}} with options to enable JMX, let's go ahead and expose Solr
Metrics via JMX
**** if, for some reason, some users really want jetty level JMX stats, but not solr level
JMX, then let's offer some type of option to _disable_ Solr level JMX metric reporting ...
perhaps a {{NoopSolrJmxReporter}} they can be configure in {{solr.xml}} ?


* SolrConfig & JmxMonitoredMap
** with these changes, it's very clear that {{<jmx .../>}} configuration in {{solrconfig.xml}}
is now deliberately ignored, but...
*** nearly all of the sample (and many of the test) {{solrconfig.xml}} files on this branch
still include/document {{<jmx .../>}} and it's varous options.  All of that still needs
cleaned up on this branch before merging to master.
*** at a minimum: if a {{<jmx/>}} XML node is found in a {{solrconfig.xml}} file, Solr
should log a WARN/ERROR w/message indicating that syntax is no longer supported & being
ignored (and give people a pointer to the new way to configure JMX)
** how much of the existing config options that {{<jmx .../>}} supports are/should be
available with {{SolrJmxReporter}} in {{solr.xml}} ?
** the old style config supported *per-core* {{agentId}}, {{serviceUrl}}, or {{rootName}}
attributes...
*** {{rootName}} doesn't seem to be supported in {{SolrJmxReporter}}
**** this is probably fine.
**** that option dates back to deploying multiple solr.war files to a single appserver, something
we're not supporting anymore
*** the {{agentId}} option was created to address the possibility of multiple JMX MBeanServers
existing in the JVM on startup (possible because multiple apps each launched their own) and
this way you could deterministicly indicate which one Solr should service it's stats from.
**** it looks like this is currently supported by {{SolrJmxReporter}}, but i'm not sure if
it needs to be? i guess it doesn't hurt?
*** how {{serviceUrl}} is handled is something I'm more on the concerned about...
**** the use case motivation of supporting this attribute in the {{<jmx ... />}} config
syntax was that the JVM might/might-not be configured to service a platform level MBeanServer
and serve jVM metrics, but each individual core could individually run it's own MBeanServer
(w/it's own port & security options) -- to better isolate what stats diff JMX clients
could see.
**** This {{serviceUrl}} feature never really evolved with SolrCloud, but perhaps now is the
time when it should?
**** As implemented in {{SolrJmxReporter}} this appears (AFAICT) to now only be configurable
at a "container wide" level (in {{solr.xml}})
**** It would be nice if there was a way to configure multiple JMX MBeanServer instances (on
each solr node) that only expose metrics from specific collections (such that metrics from
those collections would not be available via the platform MBeanServer even if it was enabled
via started by the command line options).
***** Not sure if/where/how it would make sense to expose some configuration like this, or
even if it's worth pursuing (let alone pursuing right now) but I wanted to point out the original
point of {{serviceUrl}} in case anyone has any ideas.
***** Could we perhaps enhance the {{SolrMetricReporter}} base class so that all types of
reporters could be configured with inclusion/exclusion rules based on collection name?

* Testing JMX w/solr.xml changes...
** I manually modified {{solr/server/solr/solr.xml}} to include a {{SolrJmxReporter}} to continue
to test JMX support
** With this change, using {{bin/solr -e techproducts -Dcom.sun.management.jmxremote}} + jconsole
did start to expose a lot of metrics via JMX -- but many of the solr realated MBeans exposed
don't seem very useful...
*** Notably: when I went to look at the {{CACHE}} related MBeans (ex: filterCache), each one
was exposing only a single "Attribute" named "Value" (with MBeanAttributeValueInfo.Type==Object)
which appears to contain the toString (or perhaps an array of name=value strings?) of the
various cache stats (ie: lookups, cumulative_lookups, hitratio, etc...)
*** This is vastly different from master where this sort of toString info is used as the MBean's
_Description_ but each one of those individual "metrics" are exposed as individual MBean _Attributes_
with the expected MBeanAttributeValueInfo.Type (Long, Long, Float, etc...)
*** Even in the single "Value" Attribute that is exposed, some really useful info currently
available as MBean Attributes seems to be missing, notably: "description" (from the toString()),
and "name" (the class implementing the cache)
*** Not being able to access the individual (strongly typed!) cache "stats" as MBean Attributes
seems like a major step backwards?
** perplexingly, w/ the modified {{solr.xml}} file, running {{bin/solr -e techproducts}} (NOTE:
no command line args to use JMX) still caused Solr to run an MBeanServer that I could connect
to with jconsole and use to view the same solr metrics.
*** presumably this is because of the this code in {{SolrJmxReporter}} which doesn't really
make sense to me....{code}
      ManagementFactory.getPlatformMBeanServer(); // Ensure at least one MBeanServer is available.
{code}
*** Setting asside my earlier suggestions that we should consider changing our "default" behavior
when {{bin/solr}} / {{solr.in.sh}} is configured to run the JVM with JMX enabled, this code
as is doesn't follow the existing precendent of {{JmxMonitoredMap}} to be a no-op unless:
**** an MBeanServer is started at the process/appserver level (ie: command line args)
**** an agentId or serviceUrl is explicitly configured
*** In general it seems like the the _intended_ where/how/why to configure Solr to publish
metrics via JMX is now fairly confusing...
**** if the intent here is that moving forward users _must_ configure a {{SolrJmxReporter}}
in {{solr.xml}}, and they _must_ configure a {{serviceUrl}} at that point to control access,
then we definitely need to think through how we're deprecating/replacing the various things
{{bin/solr}} currently does with {{ENABLE_REMOTE_JMX_OPTS=true}} (and if/how serviceUrl's
can support configuration of things like {{com.sun.management.jmxremote.local.only}}, etc...
 -- i have no idea if they can)
*** My 2cents: expanding on my suggestion above regarding solr's default JMX behavior...
**** I think all calls to {{ManagementFactory.getPlatformMBeanServer()}} should be removed
from {{SolrJmxReporter}}, and that class should be a No-Op unless: either agentId/serviceUrl
is explicitly configured *OR* an already existing MBeanServer is found (via {{JmxUtil.findFirstMBeanServer()}})
**** I think that if no metrics reporters (or perhaps specifically no {{SolrJmxReporter}}
instances?) are configured in {{solr.xml}}, CoreContainer (or some equivilent) should check
{{JmxUtil.findFirstMBeanServer()}} and implicily register an {{SolrJmxReporter}} instance
if the MBeanServer is non-null.
**** that way command line JVM options (should be) the end all be decider as to whether a
Platform MBeanServer is launched, and if so: then Solr metrics be exposed via JMX by default.
 If users want something more fancy they can override with explicit {{SolrJmxReporter}} /
{{SolrJmxNoopReporter}} configurations

* other uses of {{ManagementFactory.getPlatformMBeanServer()}} ...
** I see now that as part of the earlier metrics work, {{SolrDispatchFilter}} was also modified
to call {{ManagementFactory.getPlatformMBeanServer()}} -- evidently so that the Metrics API
can access JVM level MBeans?
*** This explains why (on both this branch and on master) running {{bin/solr}} is all it takes
to be able to connect to the process via jconsole -- even w/o {{solr.xml}} changes or command
line args requesting that the JVM enable JMX services.
*** I don't think it's a good idea for Solr to be forcing the JVM to spin up MBeanServer instances
& accepting JMX connections, w/o the user running/configuring solr explicitly requesting
that -- particularly in the case of this {{SolrDispatchFilter}} code which seems to run even
if the user doesn't care about metrics/JMX at all!
*** It seems like the {{BufferPoolMetricSet}} and {{OperatingSystemMetricSet}} classes used
in {{SolrDispatchFilter}} should probably be using the various {{*MXBean}} impls available
directly from {{ManagementFactory}} w/o needing an MBeanServer to be running
**** see examples of doing this in {{SystemInfoHandler}}
*** but if an {{MBeanServer}} is really the only way to bridge these type of OS/JVM level
info into the metrics code then at a minimum we should change {{SolrDispatchFilter}} to also
use {{JmxUtil.findFirstMBeanServer()}} and if the {{MBeanServer}} returned is null skip registering
these metrics (in their place: log an info message, and/or register a simple String constnat
metric, that certain metrics are only available if JMX is enabled)
** In general, I think it's a really bad idea for any (non-test) Solr level code to be calling
{{ManagementFactory.getPlatformMBeanServer()}} ... we should probably consider marking it
as a forbidden-api!

* I'm -1 on removing {{TestJmxIntegration}}
** if we need to change the JMX ObjectNames we look for to match the new metrics based code,
and/or change the test init to ensure {{SolrJmxReporter}} is used by the CoreContainer (depending
on what decisions are made about default behavior) -- then so be it.
** but it seems really important to have a simple test like this sanity checking some simple
Solr metrics via a JMX {{MBeanServer}}
*** Yes i see {{CollectionsAPIDistributedZkTest}} and {{SolrJmxReporterTest}}...
**** {{CollectionsAPIDistributedZkTest}} is trusting the JMX stats to check that instanceDirs
don't collide, not doing anything to ensure that JMX is reutrning expected values.   (Independent
of the current Jira, this test should almost certaily be re-written to check this some other
way)
**** {{SolrJmxReporterTest}} creates some random metrics -- that's not really a good _integration_
test of the metrics/JMX code to check that external clients can find specific Solr metrics
like Searcher's "numDocs" (or filterCache's "lookups"), or that the values of those metrics
are updated when expected (and to the expected value) based on actions that happen in solr
(ie: docs added to the index, queries executed with an fq, etc...)

* RequestHandlersTest
** testStatistics has been removed, but why isn't there a new equivilent replacement test/asserts
using some internal metrics API?

* CollectionsAPIDistributedZkTest
** Note this change...{noformat}
         } catch (Exception e) {
+          log.info(e.toString());
           // ignore, just continue - probably a "category" or "source" attribute
           // not found
         }
{noformat}
** if this comment is still valid: then logging the exception here seems like shot term debuging
code that should not be committed
** if this comment is no longer valid based on the other changes in the loop above this, then
the comment should be deleted
** if the log message is still useful, then it should log the {{Exception}} (w/stack trace),
not the {{toString()}}

* MetricUtils
** all of these (newly) public methods should have javadocs explaining their function / intended
purpose

* SolrCore
** using {{\_notset\_}} and {{\_auto\_}} as special values here for the collection & shardId
metrics seems like a really bad idea.  In both cases...
*** if we're not in cloud mode, then it seems like we shouldn't report a collection name or
a shardId as part of the metrics at all
*** if we're in cloud mode, then isn't it an error if either collection name or shardId are
null?
**** even if there is some situation i'm not thinking of where these might be legitimately
null (in cloud mode), can't/shouldn't we just return {{NULL}} for these metric instead of
trying to use magic string values that client code might missinterpret?
**** if {{NULL}} isn't an option, then throwing an exception seems better then using magic
strings that might be missinterpreted/missused by client code
**** IE: I'd rather some automated client code get a failure trying to access the metric named
"collection" then get a value that the client code might try to use down the line to query
for a collection and gets a weird 404 error later.




> SolrInfoMBean-s category and hierarchy cleanup
> ----------------------------------------------
>
>                 Key: SOLR-9959
>                 URL: https://issues.apache.org/jira/browse/SOLR-9959
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: metrics
>    Affects Versions: master (7.0)
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>            Priority: Blocker
>             Fix For: master (7.0)
>
>         Attachments: SOLR-9959.patch
>
>
> SOLR-9947 changed categories of some of {{SolrInfoMBean-s}}, and it also added an alternative
view in JMX, similar to the one produced by {{SolrJmxReporter}}.
> Some changes were left out from that issue because they would break the back-compatibility
in 6.x, but they should be done before 7.0:
> * remove the old JMX view of {{SolrInfoMBean}}-s and improve the new one so that it's
more readable and useful.
> * in many cases {{SolrInfoMBean.getName()}} just returns a FQCN, but it could be more
informative - eg. for highlighter or query plugins this could be the symbolic name of a plugin
that users know and use in configuration.
> * top-level categories need more thought. On one hand it's best to minimize their number,
on the other hand they need to meaningfully represent the functionality of components that
use them. SOLR-9947 made some cosmetic changes, but more discussion is necessary (eg. QUERY
vs. SEARCHHANDLER)
> * we should consider removing some of the methods in {{SolrInfoMBean}} that usually don't
return any useful information, eg. {{getDocs}}, {{getSource()}} and {{getVersion()}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message