community-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin A. McGrail (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (COMDEV-260) GSOC 2018 SpamAssassin Bayes Token ID
Date Mon, 05 Feb 2018 04:39:00 GMT

     [ https://issues.apache.org/jira/browse/COMDEV-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kevin A. McGrail updated COMDEV-260:
------------------------------------
    Summary: GSOC 2018 SpamAssassin Bayes Token ID  (was: SpamAssassin Bayes Token ID)

> GSOC 2018 SpamAssassin Bayes Token ID
> -------------------------------------
>
>                 Key: COMDEV-260
>                 URL: https://issues.apache.org/jira/browse/COMDEV-260
>             Project: Community Development
>          Issue Type: Project
>            Reporter: Kevin A. McGrail
>            Priority: Major
>
> From DFS idea used with permission:
> We tokenize inbound messages and store the tokens on the server. In each message, we
add links for doing training. When you click on a training link, the system trains the message
based on the tokens stored on the server. In that way, you are training using exactly the
tokens that the Bayes code saw. 
> For SA, the key point is a framework to store the Bayesian tokens from the email before
delivery of the email so later, a "this is spam" "this is ham" mechanism can take advantage
of that information without having the entire email.
> Adding a header with the message id for the storage of the headers allows a framework
to be built for train as spam, train as ham to be more readily built.
> The issues you are pointing to have to deal more with the implementation of the this
is spam/this is ham mechanism.
> By storing just the tokens, there is less space and privacy & legal concerns are
mitigated.
> sa-learn would then be extended to use the message id and learn as spam/ham instead of
feeding it the entire message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Mime
View raw message