community-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin A. McGrail (JIRA)" <j...@apache.org>
Subject [jira] [Created] (COMDEV-260) SpamAssassin Bayes Token ID
Date Sun, 04 Feb 2018 10:40:00 GMT
Kevin A. McGrail created COMDEV-260:
---------------------------------------

             Summary: SpamAssassin Bayes Token ID
                 Key: COMDEV-260
                 URL: https://issues.apache.org/jira/browse/COMDEV-260
             Project: Community Development
          Issue Type: Project
            Reporter: Kevin A. McGrail


>From DFS idea used with permission:

We tokenize inbound messages and store the tokens on the server. In each message, we add links
for doing training. When you click on a training link, the system trains the message based
on the tokens stored on the server. In that way, you are training using exactly the tokens
that the Bayes code saw. 

For SA, the key point is a framework to store the Bayesian tokens from the email before delivery
of the email so later, a "this is spam" "this is ham" mechanism can take advantage of that
information without having the entire email.

Adding a header with the message id for the storage of the headers allows a framework to be
built for train as spam, train as ham to be more readily built.

The issues you are pointing to have to deal more with the implementation of the this is spam/this
is ham mechanism.

By storing just the tokens, there is less space and privacy & legal concerns are mitigated.

sa-learn would then be extended to use the message id and learn as spam/ham instead of feeding
it the entire message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Mime
View raw message