ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikita Ivanov <nivano...@gmail.com>
Subject usage analytics
Date Wed, 05 Jul 2017 20:37:47 GMT
Igniters,
I would like to kick off the discussion on the idea of collecting Ignite
usage statistics. The basic idea behind this is to better understand
general and anonymous Ignite usage information to better calibrate
community efforts in developing new features, improving existing ones,
delivering better documentation - and in every other way to make our
project a better software solution.

Although such instrumentation is standard practice in commercially
developed software, for an ASF project this could be a sensitive issue.
Therefore I would like to initiate a full community discussion on how best
to implement such practice for the benefit of project while ensuring the
privacy protection of Ignite users.

To ignite (pun intended) the discussion I'll outline below some of the
basic thoughts that I have on this subject. They are here only to give an
idea of what such instrumentation may potentially look like so that we can
discuss the merits of this idea in a tangible context.

Overview
-------------
Upon start and every hour thereafter each Ignite node will collect, encrypt
and send usage statistics over HTTPS to the ASF-hosted server. That server
will accept such HTTPS packets, decrypt them and store them in a
time-series DB. A web interface will be provided to view the usage
information.

Opt-In or Opt-out
-------------------------
Opt-out. Ignite website will offer simple instructions (system property) on
how to disable this instrumentation.

Code, Infra, Access
---------------------------
Ignite instrumentation will be part of the Ignite code base. The collection
server will be a separate module in the Ignite code base (released
separately from Ignite). The collection server will be hosted by ASF Infra.

Usage statistics will be publicly accessible by anyone in the community.

Private, Personal Data
------------------------------
No private or personal data will ever be transferred. No emails, usernames,
company names, grid names, etc.

Data Retention
--------------------
All data will be retained for 1 year and deleted permanently thereafter.

Usage Data
----------------
The following data will be collected in each packet sent to the collection
server:
- GRID_SIZE (to correspond our testing environment with the more frequent
cluster sizes)
- IP_ADDR (for general geo-tracking as well as to know what documentation
language should be a priority)
- SES_ID (to track continues uptime vs. re-starts)
- USERNAME_TYPE (privilege username vs. standard, to track production vs.
dev/testing usage; note - this is not an actual username)
- OS_NAME
- OS_VER
- OS_ARCH
- JAVA_VER
- JAVA_VENDOR
- COMP_SQL (whether or not this feature was used)
- COMP_COMPUTE (whether or not this feature was used)
- COMP_DATAGRID (whether or not this feature was used)
- COMP_STREAMING (whether or not this feature was used)
- COMP_IGFS (whether or not this feature was used)
- COMP_SERVICE (whether or not this feature was used)
- COMP_PERSISTENCE (whether or not this feature was used)

Please let's discuss this idea. Everyone's comments and suggestions are
*extremely* welcome.

Thanks,
Nikita Ivanov.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message