spamassassin-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From spamassassin-...@incubator.apache.org
Subject [SpamAssassin Wiki] Updated: CeasNotesJustin
Date Sat, 31 Jul 2004 22:13:01 GMT
   Date: 2004-07-31T15:13:00
   Editor: JustinMason <jm@jmason.org>
   Wiki: SpamAssassin Wiki
   Page: CeasNotesJustin
   URL: http://wiki.apache.org/spamassassin/CeasNotesJustin

   no comment

Change Log:

------------------------------------------------------------------------------
@@ -296,4 +296,37 @@
   * q from John Levine: TurnTide does exactly this technique by narrowing the TCP window
on the spammer's connections.
   * q: why not just use delayed ACKs?   a: because it's not entirely as effective as the
other techniques
 
+AOL hashing:
+
+  * I-Match: large corpus; lexicon generation
+  * intersection of document and lexicon gives signature
+  * trad I-Match lexicon generation: reject v frequent and hapaxes
+  * use "Mutual Information" as a measurement of fitness to avoid overlapping rules
+  * use multiple lexicons to avoid randomization from having an effect
+  * generate multiple lexicons, by removing random entries from an original lexicon
+  * also: distributional word clustering (Information Bottleneck) for lexicon selection (Terms
with similar class distribution of P(spam|term))
+  * q: "'cluster' selection" -- is that reports from live users?  yep
+  * q: "FP rate?"   a: very very low
+
+Distributed, collaborative spam filtering:
+
+  * TCD, yay
+  * definition: "spam is email that the recipient is interested in receiving".  we disagree,
of course ;)
+  * P2P approach
+
+Reputation network analysis for mail filtering:
+
+  * 75% of semweb data is FOAF files
+  * using web of trust
+  * a bit like http://web-o-trust.org/ , but not yet workable with email addrs since there's
no spoofing protection
+
+On attacking statistical spam filters:
+
+  * spammers wanted to evade bayes
+  * tokenization/obfuscation: turn out to be good spamsigns
+  * should not have used SpamArchive spam, due to its lack of headers, in my opinion; headers
improve spam recognition greatly
+  * pretty similar to http://www.cs.dal.ca/research/techreports/2004/CS-2004-06.pdf ;)
+
+
+
 

Mime
View raw message