incubator-alois-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Alois Wiki] Update of "IMF2011" by MarcusHolthaus
Date Sun, 16 Jan 2011 21:30:01 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Alois Wiki" for change notification.

The "IMF2011" page has been changed by MarcusHolthaus.
The comment on this change is: Rework: Content corrections, adaptions, additions, and some
changes in language from another guy with german tongue..
http://wiki.apache.org/alois/IMF2011?action=diff&rev1=20&rev2=21

--------------------------------------------------

  = Apache ALOIS - A true open source plattform for computer forensics =
  Urs Lerch
  
- ''Abstract: ''Although computer forensics is foremost all about recovering, collecting and
analyzing data, there is, at least as far as we know, no central platform for all of this.
Sure, there exists a dozen of software tools, all good in their defined area. But when it
comes to integration to a whole, often incompatibility of data and the lack of interfaces
are severe problems. In our opinion, a good part of this problem lies in the nature of proprietary
software. Although Open Source Software is not the "holy grail" and doesn't deliver a solution
to this problem per se, a community driven development can help to overcome a great part of
these issues. Apache ALOIS is an open source tool, originally designed as SIEM (Security Information
and Event Management). But since its main tasks are collecting and analyzing data as well
as reporting, it could very well help as integration plattform for all collected data within
a computer forensics process.
+ ''Abstract: ''Although computer forensics is above all about recovering, collecting and
analyzing data, there is, at least as far as we know, no central platform for the integration
of all the varying data that is being created in a forensics process. Sure, there exist dozens
of valuable software tools, all specialized in one or more defined areas. But when it comes
to integration and consolidation of these data collections, often incompatibility of data
and the lack of interfaces form severe problems. In our opinion, a good part of this problem
lies in the nature of proprietary software. A community driven development can help to integrate
these data collections by providing interfaces to the varius software tools. Apache ALOIS
is an open source tool, originally designed as SIEM (Security Information and Event Management)
with Data Leakage Detection (DLD) in mind. But since its main tasks are collecting and analyzing
data as well as reporting, it could very well be used as an integration plattform for all
collected data within a computer forensics process.
  
  == Introduction ==
- The aim of computer forensics is it to acquire, analyze and evaluate digital tracks in the
context of an already conducted or yet only planned criminal act. This requires highly specialized
knowledge as well as highly specialized tools. It therefore makes sense that these tasks are
divided, each step performed by for this particular step specialized staff using the aprobiate
tools.
+ The aim of computer forensics is to acquire, analyze and evaluate digital tracks in the
context of an already conducted or yet only planned criminal act. This requires highly specialized
knowledge both in IT, and the field of the crime, as well as highly specialized tools. It
therefore makes sense that tasks are divided among specialists each using their own tools.
  
- For the first step, the collection of data, IT knowledge is needed in the first place. It
therefore makes sense, that adequately skilled, technologically-oriented people are responsible
for this task. Of course they need a somewhat basic criminal technical expertise beside. Under
these conditions, the tools used can - and must probably have to - be technologically challenging.
For the analysis of the data, in particular criminalistic knowledge, intuition and a relevant
experience is required. Basic technical knowledge must be provided, but shall not be central
in any way. The tools used should therefore be less technologically sophisticated, albeit
a database query using SQL for example may be required. For the evaluation process, in particular
legal expertise is needed. While it does need understanding of the digital media, technological
knowledge should be provided as little as possible. Therefore, the tools used have to be very
user friendly.
+ Usually collection the data is the first step in forensics. In our digital age this requires
in-depth IT knowledge. Adequately skilled, technologically-oriented people are responsible
for this task. Of course they need a somewhat basic criminal technical expertise as a knowledge
background. The tools employed are usually technologically challenging. The second step -
the analysis of the collected data - requires particular criminalistic knowledge, intuition
and a experience in the relevant fileds of the crime. Basic technical knowledge must be provided,
but shall not be central in any way. The tools used should therefore be less technologically
sophisticated, albeit a database query using SQL for example may be required. For the evaluation
process, mainly legal expertise is needed. While it does require understanding of the digital
media, technological knowledge should be provided as little as possible. Therefore, the tools
used have to be very user friendly.
  
- In this division of tasks, the overall view must not be lost. Here, a cross-software platform
might be of great help for computer forensics. This platform must ensure first of all, that
all the information is available for the entire process in the respective most appropriate
form. This means, that the task of the creation and access to this information corresponds
with the necessary know how in the respective process step. Such a platform can also take
on additional services, such as a workflow or communications. Another advantage of a centralized
database is the possibility of cross-case analysis. Furthermore, it could be assured that
all the information of a case is stored in one place and, therefore, can be easily controlled
and understood. Moreover, as the aim must be to use the most appropriate tool for each task,
it is important that this platform has an open architecture and open interfaces. It must therefore
be independent of a provider. In this respect, it makes sense to pursue a free implementation
of this platform, that is an open source software.
+ During this division of tasks, the overall view must not be lost. Here, a cross-software
platform might be of great help for computer forensics. This platform must ensure that all
the information is available for the entire process in the respective most appropriate form.
This means, that the task of the creation and access to this information corresponds with
the necessary know how in the respective process step. Such a platform can also take on additional
services, such as a workflow or communications. Another advantage of a centralized database
is the possibility of cross-case analysis. Furthermore, it could be assured that all the information
of a case is stored in one place and, therefore, can be easily controlled and understood.
Moreover, as the aim must be to use the most appropriate tool for each task, it is important
that this platform has an open architecture and open interfaces. It must therefore be independent
of a provider. In this respect, it makes sense to pursue a free implementation of this platform,
that is an open source software.
  
  == Open Source Software ==
  [This brief introduction is an excerpt of the PhD of the author.]
  
- The idea of open source software - originated from a movement of computer hackers who have
developed software primarily in their leisure time for fun - is still wearing the halo of
being a project of unpaid volunteers. However, Free/Libre and Open Source Software (FLOSS)
is in an accelerated process of adaptation to the market. This development takes place along
a cycle of innovation, as is represented in economics by Schumpeter (1961) for example. Therefor,
various studies show that especially the big projects like the Linux operating system, the
office suite OpenOffice or the database MySQL is pursued by a majority of developers paid
for their contributions [eg Kroah-Hartman 2009].
+ The idea of open source software - originated from a movement of computer hackers who have
developed software primarily in their leisure time for fun - is still wearing the halo of
being a project of unpaid volunteers. However, Free/Libre and Open Source Software (FLOSS)
is in an accelerated process of adaptation to the market. This development takes place along
a cycle of innovation, as is represented in economics by Schumpeter (1961) for example. Therefore,
various studies show that especially the big projects like the Linux operating system, the
office suite OpenOffice or the database MySQL is pursued by a majority of developers paid
for their contributions [eg Kroah-Hartman 2009].
  
- In simple terms, Open Source Software (OSS) is defined on the one side by an open, community-oriented
development process. On the other hand, it is defined by an open license. The former means,
that OSS is less dependent on individual persons, highly decentralized, and only very limited
is planable. The latter usually means that OSS can be used free of charge. However, free of
charge is not a requirement of Open Source at all. Therefor, the word "free" has to be understood
in the sense of "free speech" and not "free beer" [Richard Stallman]. In this sense, over
time, several business models established,b e it with the software itself or with services
on top of it.
+ In simple terms, Open Source Software (OSS) is defined on the one side by an open, community-oriented
development process. On the other hand, it is defined by an open license. The former means,
that OSS is less dependent on individual persons, highly decentralized, and only very limited
is planable. The latter usually means that OSS can be used free of charge. However, free of
charge is not a requirement of Open Source at all. The word "free" has to be understood in
the sense of "free speech" and not "free beer" [Richard Stallman]. In this sense, over time,
several business models established,b e it with the software itself or with services on top
of it.
  
  While there are projects that are largely dominated by one company, it is more and more
realized that open source software can be developed better when there is a large degree of
independence. To achieve this, many projects founded independent non-profit organizations
that play a mediating role. The first project launched this was the Apache web server with
the Apache Software Foundation. The non-profit organization takes on the one hand the role
of the legal person, on the other hand, it is responsible for the infrastructure. The important
thing is, that the organization can preserve their independence from the cooperating companies.
At the Apache Software Foundation this has been solved in the way that only "private" people
can be members, but not organizations, while in principle every person has the same rights,
regardless of their financial contribution. In addition, the committees are elected democratically
by the members, and again every person has one voice (Fielding 1999). That a project's legal
independence is elementary for the participation of commercial organizations, showed, among
other things, the Eclipse project. The platform, originally developed by IBM internally for
its own use, has opened its source in an early stage to make it more interesting for partner
companies (O'Mahony et al. 2005). But it has been the later detachment of the project from
IBM through the transfer of rights to the independent Eclipse Foundation, that was able to
convince other companies to participate (Spaeth et al. 2008). Today, Eclipse is the de facto
standard in the field of Java development platforms and has a market share of well over 50%.
  
@@ -30, +30 @@

  By this he means the combination of the three components of open architecture, open standards
and open source, in which a full interoperability can be achieved. The goal of "Open Computing"
is the flexibility of a modular integration of function as well as independence from manufacturers,
both in hardware and in software. While for example Apple goes the opposite way, due to the
experiences of recent years and decades it can be predicted with good conscience, that software
will be successful mainly because of its openness.
  
  == What does Apache ALOIS stand for? ==
- Apache ALOIS [http://incubator.apache.org/alois/] is a log collection and correlation software
with reporting  and alarming functionalities. ALOIS stands for "Advanced Log Data  Insight
System" and is meant to be a fully implemented open source SIEM  security information and
event management system. While almost all other SIEM software, be it closed or open source,
concentrate on the technological part of security monitoring, Apache ALOIS is aimed to monitor
the security of the content. It intends to be pro-active in the detection of potential loss,
theft, mistaken modification or unauthorized access. Apache ALOIS works on log messages and
thus contains all the basic functionality of a conventional SIEM, as centralized collecting,
normalizing, aggregation, analyzing and correlating of all log messages, as well as reporting
all security related events. Therefore it can be used as any other SIEM.
+ Apache ALOIS [http://incubator.apache.org/alois/] is a message collection, message splitting
and message correlation software with reporting and alarming functionalities. ALOIS stands
for "Advanced Log Data Insight System" and is meant to be a fully implemented open source
security information and event management system (SIEM). While almost all other SIEM software,
be it closed or open source, concentrate on the technological part of security monitoring,
Apache ALOIS is aimed to monitor the security of the content. It intends to be pro-active
in the detection of potential loss and theft (data leakage), mistaken modification or unauthorized
access. Apache ALOIS works on log messages and thus contains all the basic functionality of
a conventional SIEM, as centralized collecting, normalizing, aggregation, analyzing and correlating
of all messages, as well as reporting all security related events. Therefore it can be used
in place of any other SIEM.
  
- Since fall 2010 Apache ALOIS is an effort undergoing incubation at The Apache Software Foundation
(ASF). Incubation is required of all newly accepted projects until a further review indicates
that the infrastructure, communications, and decision making process have stabilized in a
manner consistent with other successful ASF projects. The ASF [http://www.apache.org] is made
up of nearly 100 top level projects that cover a wide range of technologies. While you probably
know some of them by name, you surely use a lot of them not knowing it at all by just using
the internet. Most of all there is the name giving webserver, which hosts more than two third
of all websites [http://greatstatistics.com/]. The Apache projects are defined by collaborative
consensus based processes, an open, pragmatic software license and a desire to create high
quality software that leads the way in its field. This is known as the "Apache way".
+ Since fall 2010 Apache ALOIS is an undergoing incubation at The Apache Software Foundation
(ASF). Incubation allows for a software system to reach a stability level equivalent to other
successful ASF projects, regarding infrastructure, communications, and decision making. The
ASF [http://www.apache.org] is made up of nearly 100 top level projects that cover a wide
range of technologies. While some of them are widely known by name, many more are in wide
use as part of may popular internet services. The best-known project ist the HTTP-Server,
which hosts more than two third of all internet websites [http://greatstatistics.com/]. Apache
projects are defined by collaborative, consensus-based processes, an open, pragmatic software
license and a desire to create high quality software that leads the way in its field. This
is known as the "Apache way".
  
- While incubation status is not necessarily a reflection of the  completeness or stability
of the code, it does indicate that the project  has yet to be fully endorsed by the ASF. In
fact, Apache ALOIS has shown its functioning over several years in production. Apache ALOIS
is aimed to be totally free and open for all contributions. The openness provided for other
programming languages is certainly proof of this. The plug-ability - yet to be further developed
- is meant to guarantee that individual needs can be realized without stressing the whole
system too much. Furthermore, the basic functionality of ALOIS may be extended in directions
not yet foreseen. In our opinion, the Linux kernel is a good example that this can work very
well.
+ While incubation status is not necessarily a reflection of the completeness or stability
of the code, it does indicate that the project has reached a stable phase and has the potential
to be fully endorsed by the ASF. In fact, Apache ALOIS has shown its functioning over several
years in production. Apache ALOIS is aimed to be totally free and open for all contributions.
The openness provided for other programming languages is certainly proof of this. The plug-ability
- an active field of work in progress - is meant to guarantee that individual needs can be
realized without stressing the whole system. Furthermore, the basic functionality of ALOIS
may be extended in directions not yet foreseen. In our opinion, the Linux kernel is a good
example that this can work very well.
  
  == SIEM and computer forensics ==
+ Since Apache ALOIS has originally been designed as a Security Information and Event Management
(SIEM) system, it makes sense to give a very brief introduction in this field. The term SIEM
is a combination of SIM (security information management) and SEM (security event management),
which are disparate tool categories. While SIM is meant to provide long-term storage, analysis
and reporting of log data, SEM deals with real-time monitoring, correlation of events, notifications
and console views. Now, a SIEM combines these two functionalities in one tool. The term Security
Information Event Management (SIEM) describes the capabilities of gathering, analyzing and
presenting information from very different sources as network and security devices, identity
and access management applications, operating system, database and application logs and even
external threat data. While the sources are at least partly very different from those of computer
forensics, the capabilities are almost the same! Usually they are forwarded from their respective
source to the SIEM as messages (log messages, triggers, traps, file submissions, database
table submissions etc.).
- Since Apache ALOIS is originally designed as a Security Information and Event Management
(SIEM) system, it makes sense to give a very brief introduction in this field. The term SIEM
is a combination of SIM (security information management) and SEM (security event management),
which are disparate tool categories. While SIM is meant to provide long-term storage, analysis
and reporting of log data, SEM deals with real-time monitoring, correlation of events, notifications
and console views. Now, a SIEM combines these two functionalities in one tool.
- 
- The term Security Information Event Management (SIEM) describes the capabilities of gathering,
analyzing and presenting information from very different sources as network and security devices,
identity and access management applications, operating system, database and application logs
and even external threat data. While the sources are at least partly very different from those
of computer forensics, the capabilities are almost the same!
  
  == The Architecture of Apache ALOIS ==
  Apache ALOIS consists of five modules interacting to ensure a scaleable functionality of
a SIEM:
  
-  * Insink is the message sink, which is the receiving entry point  for all the different
log messages into Apache ALOIS. It is partly  based on the syslog-ng software. Insink listens
for messages (UDP),  waits for messages (TCP), receives message collections (files, emails)
 and pre-filters them to prevent from message flow overload.
+  * Insink is the message sink, which is the receiving entry point  for all the different
messages into Apache ALOIS. It is partly based on the syslog-ng software. Insink listens for
messages (UDP), waits for messages (TCP), receives message collections (files, emails) and
pre-filters them to prevent from message flow overload.
-  * Pumpy is the incoming FIFO buffer, implemented as a relational  database tables. which
contain the incoming original messages (in raw  format). In a complex system setup, there
may be several insink  instances, e.g. for a group of hosts, for specific types of messages,
or  for high-avaliablity.
+  * Pumpy is the incoming FIFO buffer, implemented as a relational database tables, which
contain the incoming original messages (in raw format). In a complex system setup, there may
be several insink instances, e.g. for a group of hosts, for specific types of messages, or
for high-avaliablity.
-  * Prisma contains logic to split up the text of log messages  into separate fields, based
on regular expressions. Actually, "prisma"  is a set of "prismi", each one prisma for one
type of log message  (apache, cisco etc. Several prismi can be applied to the same message.
 This allows for stacked messages, i.e. forwarded log messages contained  in compressed files
contained in e-mail messages. The data retrieved  form the log messages is stored in a database
called Dobby. Due to  prisma being written in Ruby, prismi can be applied interactively (when
 having system access).
+  * Prisma contains logic to split up the text of messages into separate fields, based on
regular expressions. Actually, "prisma"  is a set of "prismi", each one prisma for one type
of message (apache, cisco etc.). Several prismi can be applied to the same message.  This
allows for stacked messages, i.e. forwarded messages contained  in compressed files contained
in e-mail messages. The data retrieved from the messages is stored in a database called Dobby.
Due to prisma being written in Ruby, prismi can be applied interactively (when having system
access or through a message field on the website).
-  * Dobby is the central log database. It should be separated from  the Pumpy database for
availability and performance reasons. The  current implementation is based on MySQL.
+  * Dobby is the central database. It is usually separated from the Pumpy database for availability
and performance reasons. The current implementation is based on MySQL.
-  * The Analyzer contains the two sub-systems Lizard and Reptor.  Lizard is the analysis
engine and user interface of Apache ALOIS,  implemented in Ruby on Rails using AJAX. It allows
for interactive  browsing through the collected data, exclusion/inclusion/selection of  data,
data sorting, data filtering, creation of views, ad-hoc textual  and graphical reporting.
Reptor allows for automatic activation of views  and comparison of these views' results to
a predefined result (pattern  matching). In case of mismatch, Reptor sends the result to predefined
 e-mail addresses.
+  * The Analyzer contains the two sub-systems Lizard and Reptor. Lizard is the analysis engine
and user interface of Apache ALOIS, implemented in Ruby on Rails using AJAX. It allows for
interactive  browsing through the collected data, exclusion/inclusion/selection of  data,
data sorting, data filtering, creation of views, ad-hoc textual and graphical reporting. Reptor
allows for automatic activation of views  and comparison of these views' results to a predefined
result (pattern  matching). In case of mismatch, Reptor sends the result to predefined  e-mail
addresses.
  
- Since an image explains more than a thousand words, here is an overview of the data flow
through the different modules:
+ Figure 1 shows an overview of the data flow through the different modules:
  
  {{http://incubator.apache.org/alois/images/overview-3tier-flowchart.png}}
+ [Figure 1: ALOIS Message Flow and main components]
  
- Up to now, the creation of the input is not part of Apache ALOIS. On the one hand, the logs
are generated by the the different systems itself. On the other hand, by using the standard
interface SYSLOG, there was no need to care too much on this part so far. Other SIEM software
do include so called "agents", which create, collect and/or prepare information for the tool.
By using the technology of agents, Apache ALOIS could easily be extended for completly new
use cases.
+ Apache ALOIS is open to any type of input - whatever the system or tool at hand has as an
output. The standard interfaces are syslog, smtp and file upload. In SIEM context, "agents"
provide for various formats, and Apache ALOIS could easily be extended for any kind of input.
  
- == Extending Apache ALOIS to a platform for computer forensics ==
+ == Using Apache ALOIS as a platform for computer forensics ==
- As already mentioned above, although it is a SIEM, Apache ALOIS already fulfills a lot of
the functionality needed in computer forensics. The tasks of analysis, evaluating and reporting
is already included. Although the correlation functionality and a forensic console is a standard
within SIEM systems, Apache ALOIS sees its main strengths in these domains. The forensic console
has an easy to use web frontend, which will look familiar to most of the computer users:
+ As already mentioned above, although it is a SIEM, Apache ALOIS already fulfills a lot of
the functionality needed in computer forensics. The tasks of analysis, evaluating and reporting
is already included. The correlation functionality and a forensic console are a common standard
within SIEM systems, and Apache ALOIS sees its main strengths in these domains. ALOIS has
source protection (it prevents the alteration of collected data) and further protects it using
hash functions. Anonymisation features have been prepared to meet data protection requirements,
as have functions to reverse anonymisation to allow for legal prosecution. The forensic console
has an easy to use web frontend, which will look familiar to most regular web interface users
(see figure 2). 
  
  {{http://incubator.apache.org/alois/images/forensicConsole.png}}
+ [Figure 2: ALOIS Console]
  
- Of course, Apache ALOIS has to be configured to become a computer forensics platform. But
the configuration has to be done only once. And since it is an open source tool, configuration
can be reused. What has to be done, to make a true computer forensics tool out of Apache ALOIS,
is the task of the extraction of data. In a SIEM this is called an agent. Of course, we wouldn't
dare to propose to rewrite all the great tools used in this area. The meaning of an agent
is the one of a connector. Thus all the tools have to get a connector. This should done by
the vendor of the tool. To make this as easy as possible, Apache ALOIS plans to build a "service
bus" with standardized interfaces. The architecture of such a service bus could look like
this:
+ Apache ALOIS can be configured to become any type of computer forensics platform. Configurations
can be shared, published and reused, and can be instantiated on a case-by-case basis, thus
separating date from several forensic cases. Separated databases can be combined to allow
for cross-case anaylsis. Many of the standard forensic tools have data export capabilities,
and import filters (ALOIS agents) for these filters are easy to create, though probably man
in number. Agents may be created by the vendor of the tool, or by the ALOIS team. Apache ALOIS
intends to build a "service bus" with standardized interfaces. The proposed architecture looks
like figure 3.
  
  {{http://incubator.apache.org/alois/images/Apache ALOIS Service Bus_small.png}}
+ [Figure 3: ALOIS service bus]
  
- Therefore, it will not only be easy to connect a - proprietary or open source - application
to the system. It will also be possible to replace one or another standard moduls of Apache
ALOIS with the one that fits better the own special needs.
+ Therefore, it will not only be easy to connect a - proprietary or open source - application
to the system. It will also be possible to replace one or another standard modules of Apache
ALOIS with the one that fits better the own special needs.
  
  == Conclusion ==
- Computer forensics is a domain with highly specialised tools from numerous vendors. What
is lacking is an integration platform, where all the data come together and therefore can
be correlated. Apache ALOIS is a SIEM and has already build in correlation. Since it is open
source software, it could be extended to a computer forensics cross-software platform, that
is vendor independent. Moreover, the fact that the software project is part of the Apache
community, guarantees its independence, a commercial-friendly licence and a healthy development.
+ Computer forensics is a domain with highly specialised tools from numerous vendors. What
is lacking is an integration platform, where all the data can be combined and be correlated.
Apache ALOIS is a SIEM and has already build in correlation. Since it is open source software,
it could be extended to a vendor-independent computer forensics cross-software platform. Moreover,
the fact that the software project is part of the Apache community, guarantees its independence,
a commercial-friendly licence (i.e. distribution free of charge) and a healthy development.
  
  == References ==
  [...]

Mime
View raw message