ace-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject svn commit: r1335036 - /ace/site/trunk/content/dev-doc/analysis/auditlog-analysis.mdtext
Date Mon, 07 May 2012 14:34:21 GMT
Author: marrs
Date: Mon May  7 14:34:20 2012
New Revision: 1335036

Added first conversion of the audit log analysis.


Added: ace/site/trunk/content/dev-doc/analysis/auditlog-analysis.mdtext
--- ace/site/trunk/content/dev-doc/analysis/auditlog-analysis.mdtext (added)
+++ ace/site/trunk/content/dev-doc/analysis/auditlog-analysis.mdtext Mon May  7 14:34:20 2012
@@ -0,0 +1,112 @@
+Title: Audit Log Analysis
+An audit log is a full historic account of all events that are relevant for a certain object.
In this case, we keep audit logs of each target that is managed by the provisioning server.
+The first issue is where to maintain the audit log. On the one hand, one can maintain it
on the target, but since the management agent talks to the server, it could keep the log too.
+Then there is the question of how to maintain the log. What events should be in it, and what
is an event?
+Finally, the audit log should be readable and query-able, so people can review it.
+The following use cases can be defined:
+* Store event. Stores a new event to the audit log.
+* Get events. Queries (a subset of) events.
+* Merge events. Merges a set of (new) events with the existing events.
+We basically have two contexts:
+* Target, limited resources, so we should use something really "lean and mean".
+* Server, scalable solution, expect people to query for (large numbers of) events.
+Possible solutions
+As with all repositories, there should be one location where it is edited. In this case,
the logical place to do that is on the target itself, since that is where the changes actually
occur. In theory, the server also knows, but that theory breaks down if things fail on the
target or other parties start manipulating the life cycle of bundles. The target itself can
detect such activities.
+The next question is what needs to be logged. And how do we get access to these events?
+When storing events, each event can get a unique sequence number. Sequence numbers start
with 1 and can be used to determine if you have the complete log.
+Assuming the target has limited storage, it might not be possible to keep the full log available
locally. There are a couple of reasons to replicate this log to a central server:
+* space, as said the full log might not fit;
+* safety, when the target is somehow (partly) erased or compromised, we don't want to loose
the log;
+* remote diagnostics, we want to get an overview of the audit log without actually connecting
to the target directly.
+When replicating, the following scenarios can occur:
+1. The target has lost its whole log and really wants to (re)start from sequence number 1.
+2. The server has lost its whole log and receives a partial log.
+Starting with the second scenario, the server always simply collects incoming audit logs,
so its memory can be restored from any number of targets or relay servers that report everything
they know (again). Hopefully that will lead to a complete log again. If not, there's not much
we can do.
+The first scenario is potentially more problematic, since the target has no way of knowing
(for sure) at which sequence number it had arrived when everything was lost. In theory it
might ask (relay) servers, but even those might not have been up to date, so that does not
work. The only thing it can do here is: Start a new log at sequence number 1. That means we
can have more than one log in these cases, and that again means we need to be able to identify
which log (of each target) we're talking about. Therefore, when a new log is created, it should
contain some unique identifier for that log (an identifier that should not depend on stored
information, so for example we could use the current time in milliseconds, that should be
fairly unique, or just some random number).
+How to find the central server? Use the discovery service!? This is not that big of a deal.
+Events should at least contain:
+* a datestamp, indicating when the event occurred;
+* a checksum and/or signature;
+* a short, human readable message explaining the event;
+* details:
+    * in the form of a (possibly multi-line) document
+    * in the form of a set of properties
+The server will add:
+* the target ID of the target that logged the event.
+Storage will be resolve differently on the server and target. On the target, using any kind
of database would amount to having to include a considerable library, which makes these solutions
impractical there. We might want to consider something like that for the server though. The
options we have, are:
+* Relational database
+* Object database
+* XML
+* DIY
+How do events get logged?
+* explicitly, our management agent calls an AuditLog service method;
+* implicitly, by logging (certain) events in the system;
+Implicit algorithms can be build on top of the AuditLog service. What we need to monitor
is the life cycle layer, which basically means adding a BundleListener and an FrameworkListener.
Those capture all state changes of the framework. Technically we can either directly add those
listeners, or use EventAdmin if that is available.
+What would be the best way for the target to send audit log updates to the server? I don't
think we want the server to poll here, so the target should send updates (periodically). So
how does it know what to send?
+* it could keep track of the last event it sent, sending newer ones after that;
+* it could ask for the list of events the server has;
+* it could send its highest log event number, and get back a list of missing events on the
server, and then respond with the missing events.
+* it could just send everything.
+Having two layers for the audit log makes sense:
+* The first, lowest, layer is the AuditLog service that gives access to the log. On the one
hand it allows people to log messages, on the other it should provide query access. Those
should be split into two different interfaces.
+* The second layer can build on top of that. It can either be removed completely, which means
the responsibility for logging becomes that of the application (probably the management agent).
It can be implemented using listeners. Finally, it can be implemented using events.
+On the target we should implement a storage solution ourselves, to keep the actual code small.
The code should be able to log events quickly (as that will happen far more often than retrieving
+Communication between the target and server should be initiated by the target. The target
can basically send two commands to the server:
+1. My audit log contains sequence number 4-8, tell me your numbers. The server then responds
(for example) with 1-6. This indicates we need to send 7-8.
+2. Here you have events 7-8, can you send me 1-3? The server stores its missing events, and
sends you the events it has (always check if what you get is what you requested).
+This is setup in this way so the same commands can also be used by relay servers to replicate
logs between server and target.
+* The audit log is maintained on the target.
+* On the target, we implement the storage mechanism ourselves to ensure we have a solution
with a very small footprint.
+* On the server, we use an XStream based solution to store the logs of all the targets.
+* Our communication protocol between target and (relay)server however, should probably not
rely on XML.
+* Our communication protocol between server and (relay)server might rely on XML (determine
at design time what makes most sense).

View raw message