community-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Thomas (JIRA)" <>
Subject [jira] [Updated] (COMDEV-260) GSOC 2018 SpamAssassin Bayes Token ID
Date Mon, 26 Feb 2018 10:12:00 GMT


Mark Thomas updated COMDEV-260:
    Component/s: GSoC/Mentoring ideas

> GSOC 2018 SpamAssassin Bayes Token ID
> -------------------------------------
>                 Key: COMDEV-260
>                 URL:
>             Project: Community Development
>          Issue Type: Project
>          Components: GSoC/Mentoring ideas
>            Reporter: Kevin A. McGrail
>            Priority: Major
> From Diane F Skoll idea (used with permission):
> We tokenize inbound messages and store the tokens on the server. In each message, we
add links for doing training. When you click on a training link, the system trains the message
based on the tokens stored on the server. In that way, you are training using exactly the
tokens that the Bayes code saw.
> For SA, the key point is a framework to store the Bayesian tokens from the email before
delivery of the email so later, a "this is spam" "this is ham" mechanism can take advantage
of that information without having the entire email.
> Adding a header with the message id for the storage of the headers allows a framework
to be built for train as spam, train as ham to be more readily built.
> The issues you are pointing to have to deal more with the implementation of the this
is spam/this is ham mechanism.
> By storing just the tokens, there is less space and privacy & legal concerns are
> sa-learn would then be extended to use the message id and learn as spam/ham instead of
feeding it the entire message.
> Apache SpamAssassin is a mail filter to identify spam. It is an intelligent email filter
which uses a diverse range of tests to identify unsolicited bulk email, more commonly known
as Spam. These tests are applied to email headers and content to classify email using advanced
statistical methods. 
> In addition, SpamAssassin has a modular architecture that allows other technologies to
be quickly wielded against spam and is designed for easy integration into virtually any email
> It is primarily written in Perl with a few bits in C and shell scripts for system integration.
> The compendium at
is helpful to understand some of the concepts with SpamAssassin
> It will be helpful for a student in this project to understand SMTP but a willingness
to learn and setup your own mail server on a Linux Distribution with SpamAssassin for a personal
test domain will be very desired with assistance provided to get the basic framework for a
sandbox for learning.
> As email becomes more commodotized by major providers, knowledge of email systems and
their security is dwindling.  This opportunity can provide real-world experience with an
email security product that is employed by countless commercial systems in the world.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message