spamassassin-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jqu...@apache.org
Subject svn commit: r1651114 - in /spamassassin/trunk: INSTALL README USAGE lib/Mail/SpamAssassin/Plugin/TxRep.pm rules/v341.pre sql/README.txrep sql/txrep_mysql.sql sql/txrep_pg.sql
Date Mon, 12 Jan 2015 15:17:47 GMT
Author: jquinn
Date: Mon Jan 12 15:17:46 2015
New Revision: 1651114

URL: http://svn.apache.org/r1651114
Log:
Another pass at bookkeeping for TxRep to succeed AWL

Added:
    spamassassin/trunk/sql/README.txrep
    spamassassin/trunk/sql/txrep_mysql.sql
    spamassassin/trunk/sql/txrep_pg.sql
Modified:
    spamassassin/trunk/INSTALL
    spamassassin/trunk/README
    spamassassin/trunk/USAGE
    spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/TxRep.pm
    spamassassin/trunk/rules/v341.pre

Modified: spamassassin/trunk/INSTALL
URL: http://svn.apache.org/viewvc/spamassassin/trunk/INSTALL?rev=1651114&r1=1651113&r2=1651114&view=diff
==============================================================================
--- spamassassin/trunk/INSTALL (original)
+++ spamassassin/trunk/INSTALL Mon Jan 12 15:17:46 2015
@@ -340,7 +340,7 @@ version is too low for them to be used.
 
   - DB_File (from CPAN, included in many distributions)
 
-    Used to store data on-disk, for the Bayes-style logic and
+    Used to store data on-disk, for the Bayes-style logic, TxRep, and
     auto-whitelist.  *Much* more efficient than the other standard Perl
     database packages.  Strongly recommended.
 

Modified: spamassassin/trunk/README
URL: http://svn.apache.org/viewvc/spamassassin/trunk/README?rev=1651114&r1=1651113&r2=1651114&view=diff
==============================================================================
--- spamassassin/trunk/README (original)
+++ spamassassin/trunk/README Mon Jan 12 15:17:46 2015
@@ -319,35 +319,19 @@ init.pre, you need to uncomment the load
 prerequisites for proper operation of the plugin are present.
 
 
-Automatic Whitelist System
+Automatic Reputation System
 --------------------------
 
-SpamAssassin includes automatic whitelisting; The current iteration is
-considerably more complex than the original version.  The way it works is
-by tracking for each sender address the average score of messages so far
-seen from there.  Then, it combines this long-term average score for the
-sender with the score for the particular message being evaluated, after
-all other rules have been applied.
-
-This functionality is on by default, and is enabled or disabled with the
-"use_auto_whitelist" option.
-
-A system-wide auto-whitelist can be used, by setting the
-auto_whitelist_path and auto_whitelist_file_mode configuration commands
-appropriately, e.g.
-
-    auto_whitelist_path        /var/spool/spamassassin/auto-whitelist
-    auto_whitelist_file_mode   0666
-
-The spamassassin -W and -R command line flags provide an API to add and
-remove entries 'manually', if you so desire.  They operate based on an
-input mail message, to allow them to be set up as aliases which users can
-simply forward their mails to.  See the spamassassin manual page for more
-details.
-
-The default address-list implementation,
-Mail::SpamAssassin::DBBasedAddrList, uses Berkeley DB files to store
-the addresses.
+SpamAssassin includes an automatic reputation system. The way it works is
+by tracking for each sender address a rolling average score of messages
+so far seen from there.  Then, it combines this long-term average score
+for the sender with the score for the particular message being evaluated,
+after all other rules have been applied.
+
+This functionality can be enabled or disabled with the
+"use_txrep" option.
+
+For more information, read sql/README.txrep
 
 (end of README)
 

Modified: spamassassin/trunk/USAGE
URL: http://svn.apache.org/viewvc/spamassassin/trunk/USAGE?rev=1651114&r1=1651113&r2=1651114&view=diff
==============================================================================
--- spamassassin/trunk/USAGE (original)
+++ spamassassin/trunk/USAGE Mon Jan 12 15:17:46 2015
@@ -223,20 +223,21 @@ Other Installation Notes
 
 
   - Scores and other user preferences can now be loaded from, and Bayes
-    and auto-whitelist data can be stored in, an SQL database; see the
-    'sql' subdirectory for more details.
+    and automatic reputation data can be stored in, an SQL database; see
+    the 'sql' subdirectory for more details.
 
     If you are setting up a large 'spamd' system-wide installation, with
-    Bayes and/or auto-whitelists, we strongly recommend using SQL as
+    Bayes and/or automatic reputation, we strongly recommend using SQL as
     storage.  It has proven more reliable than the default DB_File storage
     backend at several large sites.
 
 
   - If you are running SpamAssassin under a disk quota, or are setting up
     'spamd' with users with disk quotas, be warned that the DB_File
-    database module used by SpamAssassin for Bayes and AWL storage seems
-    to be unreliable in the face of quotas (bug 3796). In this situation,
-    we recommend using SQL storage for those databases, instead of DB_File.
+    database module used by SpamAssassin for Bayes, TxRep, and AWL storage
+    seems to be unreliable in the face of quotas (bug 3796). In this
+    situation, we recommend using SQL storage for those databases, instead
+    of DB_File.
 
 
   - Lots more ways to integrate SpamAssassin can be read at

Modified: spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/TxRep.pm
URL: http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/TxRep.pm?rev=1651114&r1=1651113&r2=1651114&view=diff
==============================================================================
--- spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/TxRep.pm (original)
+++ spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/TxRep.pm Mon Jan 12 15:17:46 2015
@@ -15,6 +15,7 @@
 # limitations under the License.
 # </@LICENSE>
 
+
 =head1 NAME
 
 Mail::SpamAssassin::Plugin::TxRep - Normalize scores with sender reputation records
@@ -109,13 +110,16 @@ progressively reducing the impact of pas
 description of the factor below.
 
 6. B<Blacklisting and Whitelisting> - when a whitelisting or blacklisting was requested
-through SpamAssassin's API, AWL adjusts the historical total score by a fixed value,
-regardless of the number of messages recorded at given sender. It results in practical
-impossibility of blacklisting or whitelisting any sender with higher number of recorded
-scores. Even at senders with few messages, the impact of the whitelisting or blacklisting
-is minimal, and new messages can be still tagged incorrectly. TxRep handles black/whitelisting
-differently, so that it has the desired effect. It is explained in details in the section
-L</BLACKLISTING / WHITELISTING>.
+through SpamAssassin's API, AWL adjusts the historical total score of the plain email
+address without IP (and deleted records bound to an IP), but since during the reception 
+new records with IP will be added, the blacklisted entry would cease acting during 
+scanning. TxRep always uses the record of th plain email address without IP together 
+with the one bound to an IP address, DKIM signature, or SPF pass (unless the weight 
+factor for the EMAIL reputation is set to zero). AWL uses the score of 100 (resp. -100) 
+for the blacklisting (resp. whitelisting) purposes. TxRep increases the value 
+proportionally to the weight factor of the EMAIL reputation. It is explained in details 
+in the section L</BLACKLISTING / WHITELISTING>. TxRep can blacklist or whitelist also
+IP addresses, domain names, and dotless HELO names.
 
 7. B<Sender Identification> - AWL identifies a sender on the basis of the email address
 used, and the originating IP address (better told its part defined by the mask setting).
@@ -225,6 +229,7 @@ sub new {                       # constr
 
   # only the default conf loaded here, do nothing here requiring
   # the runtime settings
+  dbg("TxRep: new object created");
   return $self;
 }
 
@@ -649,8 +654,6 @@ need this option at all - for compatibil
 A plugin DKIM should also be enabled, as otherwise there is no benefit from
 turning on this option.
 
-=back
-
 =cut  # ...................................................................
   push (@cmds, {
     setting     => 'auto_whitelist_distinguish_signed',
@@ -672,6 +675,8 @@ ever associated with the email address,
 ones) would be treated as coming from the authorized source. However, such
 domains are hopefuly rare, and ask for this kind of treatment anyway.
 
+=back
+
 =cut  # ...................................................................
   push (@cmds, {
     setting     => 'txrep_spf',
@@ -1083,10 +1088,36 @@ sub _fn_envelope {
   unless ($self->{main}->{conf}->{use_txrep}){                                 
return 0;}
   unless ($args->{address}) {$self->_message($args->{cli_p},"failed ".$msg);   
return 0;}
 
-  my $status;
+  my $factor =	$self->{conf}->{txrep_weight_email} +
+		$self->{conf}->{txrep_weight_email_ip} +
+		$self->{conf}->{txrep_weight_domain} +
+		$self->{conf}->{txrep_weight_ip} +
+		$self->{conf}->{txrep_weight_helo};
+  my $sign = $args->{signedby};
+  my $id     = $args->{address};
+  if ($args->{address} =~ /,/) {
+    $sign = $args->{address};
+    $sign =~ s/^.*,//g;
+    $id   =~ s/,.*$//g;
+  }
+
+  # simplified regex used for IP detection (possible FP at a domain is not critical)
+  if ($id !~ /\./ && $self->{conf}->{txrep_weight_helo}) 
+	{$factor /= $self->{conf}->{txrep_weight_helo}; $sign = 'helo';}
+  elsif ($id =~ /^[a-f\d\.:]+$/ && $self->{conf}->{txrep_weight_ip})
+	{$factor /= $self->{conf}->{txrep_weight_ip};}
+  elsif ($id =~ /@/ && $self->{conf}->{txrep_weight_email})
+	{$factor /= $self->{conf}->{txrep_weight_email};}
+  elsif ($id !~ /@/ && $self->{conf}->{txrep_weight_domain})
+	{$factor /= $self->{conf}->{txrep_weight_domain};}
+  else	{$factor  = 1;}
+
+  $self->open_storages();
+  my $score  = (!defined $value)? undef : $factor * $value;
+  my $status = $self->modify_reputation($id, $score, $sign);
+  dbg("TxRep: $msg %s (score %s) %s", $id, $score || 'undef', $sign || '');
   eval {
-    $status = $self->modify_reputation($args->{address}, $value*$self->count(),
$args->{signedby});
-    $self->_message($args->{cli_p}, ($status?"":"error ") . $msg . ": " . $args->{address});
+    $self->_message($args->{cli_p}, ($status?"":"error ") . $msg . ": " . $id);
     if (!defined $self->{txKeepStoreTied}) {$self->finish();}
     1;
   } or return $self->_fail_exit( $@ );
@@ -1100,24 +1131,34 @@ sub _fn_envelope {
 
 When asked by SpamAssassin to blacklist or whitelist a user, the TxRep
 plugin adds a score of 100 (for blacklisting) or -100 (for whitelisting)
-to the given sender for every email recorded in the reputation database. It
-means, if there are 1000 emails from a given sender, his total reputation
-score will increase/decrease by 100,000 points, and the average reputation
-score is pushed close to 100 (blacklisted) or -100 (whitelisted) points (+/-
-the original average). C<reputation> is the average recorded score, which
-is equal to the C<total> / C<count>.
-
-   reputation = total / count
-   total = reputation * count
-
-The following two formulas are equivalent:
-
-   blacklisted_total = old_total + 100 * count
-   blacklisted_reputation = old_reputation + 100
-
-Blacklisting and whitelisting have the influence only on the reputation of
-the standalone email address. It does not affect the reputation scores of
-the domain name, HELO name, DKIM signature or the originating IP address.
+to the given sender's email address. At a plain address without any IP
+address, the value is multiplied by the ratio of total reputation
+weight to the EMAIL reputation weight to account for the reduced impact
+of the standalone EMAIL reputation when calculating the overall reputation.
+
+   total_weight = weight_email + weight_email_ip + weight_domain + weight_ip + weight_helo
+   blacklisted_reputation = 100 * total_weight / weight_email
+
+When a standalone email address is blacklisted/whitelisted, all records
+of the email address bound to an IP address, DKIM signature, or a SPF pass
+will be removed from the database, and only the standalone record is kept.
+
+Besides blacklisting/whitelisting of standalone email addresses, the same
+method may be used also for blacklisting/whitelisting of IP addresses,
+domain names, and HELO names (only dotless Netbios HELO names can be used).
+
+When whitelisting/blacklisting an email address or domain name, you can
+bind them to a specified DKIM signature or SPF record by appending the 
+DKIM signing domain or the tag 'spf' after the ID in the following way:
+
+ spamassassin --add-addr-to-blacklist=spamming.biz,spf
+ spamassassin --add-addr-to-whitelist=friend@good.org,good.org
+
+When a message contains both a DKIM signature and an SPF pass, the DKIM
+signature takes the priority, so the record bound to the 'spf' tag won't 
+be checked. Only email addresses and domains can be bound to DKIM or SPF.
+Records of IP adresses and HELO names are always without DKIM/SPF.
+
 In case of dual storage, the black/whitelisting is performed only in the
 default storage.
 
@@ -1279,7 +1320,7 @@ sub check_senders_reputation {
     $domain   = $signedby;
   } elsif ($pms->{spf_pass} && $self->{conf}->{txrep_spf}) {
     $ip       = undef;
-    $signedby = 'SPF';
+    $signedby = 'spf';
   }
 
   my $totalweight      = 0;
@@ -1369,6 +1410,7 @@ sub check_reputation {
         $self->{totalweight} += $weight;
         if ($key eq 'MSG_ID' && $self->count() > 0) {
             $delta = $self->total() / $self->count();
+	    $pms->set_tag('TXREP'.$tag_id,              sprintf("%2.1f",$delta));
         } elsif (defined $self->total()) {
             $delta = ($self->total() + $msgscore) / (1 + $self->count()) - $msgscore;
 
@@ -1726,9 +1768,9 @@ sub learn_message {
 ###########################################################################
   my ($self, $params) = @_;
   return 0 unless (defined $params->{isspam});
+
   dbg("TxRep: learning a message");
   my $pms = ($self->{last_pms})? $self->{last_pms} : Mail::SpamAssassin::PerMsgStatus->new($self->{main},
$params->{msg});
-
   if (!defined $pms->{relays_internal} && !defined $pms->{relays_external})
{
     $pms->extract_message_metadata();
   }
@@ -1836,30 +1878,19 @@ initializing the SQL storage, the same i
 Although the old AWL table can be reused for TxRep, by default TxRep expects
 the SQL table to be named "txrep".
 
-To install a new SQL table for TxRep, save the 'CREATE' SQL command shown
-below into a file named txrep_mysql.sql, and use the following command. You
-can also simply run the SQL command from within the respective database
-area in PhpMyAdmin:
-
- mysql -h <hostname> -u <adminusername> -p <databasename> < txrep_mysql.sql
- Enter password: <adminpassword>
-
- CREATE TABLE txrep (
-   username varchar(100) NOT NULL default '',
-   email varchar(255) NOT NULL default '',
-   ip varchar(40) NOT NULL default '',
-   count int(11) NOT NULL default '0',
-   totscore float NOT NULL default '0',
-   signedby varchar(255) NOT NULL default '',
-   PRIMARY KEY (username,email,signedby,ip)
- ) ENGINE=MyISAM;
-
-(If you get a syntax error at an older version of MySQL, use TYPE=MyISAM
-instead of ENGINE=MyISAM at the end of the command)
-
-For PostgreSQL, use the following:
+To install a new SQL table for TxRep, run the appropriate SQL file for your
+system under the /sql directory.
 
- psql -U <username> -f txrep_pg.sql <databasename>
+If you get a syntax error at an older version of MySQL, use TYPE=MyISAM
+instead of ENGINE=MyISAM at the end of the command. You can also use other
+types of ENGINE (depending on what is available on your system). For example
+MEMORY engine stores the entire table in the server memory, achieving
+performance similar to Redis. You would need to care about the replication
+of the RAM table to disk through a cronjob, to avoid loss of data at reboot.
+The InnoDB engine is used by default, offering high scalability (database
+size and concurence of accesses). In conjunction with a high value of
+innodb_buffer_pool or with the memcached plugin (MySQL v5.6+) it can also
+offer performance comparable to Redis.
 
 =cut
 

Modified: spamassassin/trunk/rules/v341.pre
URL: http://svn.apache.org/viewvc/spamassassin/trunk/rules/v341.pre?rev=1651114&r1=1651113&r2=1651114&view=diff
==============================================================================
--- spamassassin/trunk/rules/v341.pre (original)
+++ spamassassin/trunk/rules/v341.pre Mon Jan 12 15:17:46 2015
@@ -17,8 +17,7 @@
 ###########################################################################
 
 # TxRep - Reputation database that replaces AWL
-#
-# loadplugin Mail::SpamAssassin::Plugin::TxRep
+loadplugin Mail::SpamAssassin::Plugin::TxRep
 
 # URILocalBL - Provides ISP and Country code based filtering as well as
 # quick IP based blocks without a full RBL implementation - Bug 7060

Added: spamassassin/trunk/sql/README.txrep
URL: http://svn.apache.org/viewvc/spamassassin/trunk/sql/README.txrep?rev=1651114&view=auto
==============================================================================
--- spamassassin/trunk/sql/README.txrep (added)
+++ spamassassin/trunk/sql/README.txrep Mon Jan 12 15:17:46 2015
@@ -0,0 +1,87 @@
+
+Using SpamAssassin Automatic Reputation With An SQL Database
+-------------------------------------------------------
+
+The TxRep plugin improves on the earlier Auto-Whitelist plugin.
+The most common use for a system like this would be for tracking
+the expected spam score (or reputation) of frequent senders. A
+domain that sends frequent spam will lose reputation (or gain
+spam score) over time.
+
+In order to activate the SQL based reputation system you have to
+configure spamassassin and spamd to use the appropriate storage
+backend. This is done with the txrep_factory config variable,
+like so:
+
+txrep_factory Mail::SpamAssassin::SQLBasedAddrList
+
+SpamAssassin will check the global configuration file (ie. any file
+matching /etc/mail/spamassassin/*.cf) for the following settings:
+
+user_awl_dsn                 DBI:driver:database:hostname[:port]
+user_awl_sql_username        dbusername
+user_awl_sql_password        dbpassword
+
+These settings are identical to those for the AWL plugin, so you
+do not need to change these if you are upgrading.
+
+The first option, user_awl_dsn, describes the data source name that
+will be used to create the connection to your SQL server.  It MUST be
+in the format as listed above.  <driver> should be the DBD driver that
+you have installed to access your database (the most common being
+MySQL (driver is 'mysql'), PostgreSQL ('Pg') and SQLite ('SQLite')).
+<database> must be the name of the database that you created to store
+the txrep table. <hostname> is the name of the host that contains
+the SQL database server.  <port> is the optional port number where your
+database server is listening.
+
+user_awl_dsn                DBI:mysql:spamassassin:localhost
+
+Would tell SpamAssassin to connect to the database named spamassassin using
+MySQL on the local server, and since <port> is omitted, the driver will use the
+default port number.  The other two required options tells SpamAssassin to use
+the defined username and password to establish the connection.
+
+If the user_awl_dsn option does not exist, SpamAssassin will not attempt
+to use SQL for tracking reputations.
+
+One additional configuration option exists that allows you to set the
+table name for the txrep table.
+
+user_awl_sql_table           txrep
+
+For an example of connecting to a PostgreSQL database, see the README file.
+
+Requirements
+------------
+
+In order for SpamAssassin to work with your SQL database, you must have
+the perl DBI module installed, AS WELL AS the DBD driver/module for your
+specific database.  For example, if using MySQL as your RDBMS, you must have
+the Msql-Mysql (DBD::mysql) module installed.  Check CPAN for the latest
+versions of DBI and your database driver/module.
+
+Database Schema
+---------------
+
+The database must contain a table named by 'user_awl_sql_table' (default
+setting: "txrep") with at least the fields specified in the accompanying
+SQL files.
+
+You can add as many other fields you wish as long as the required fields
+are contained in the table.
+
+To install the table, use the following command:
+
+mysql -h <hostname> -u <adminusername> -p <databasename> < txrep_mysql.sql
+Enter password: <adminpassword>
+
+For PostgreSQL, use the following:
+
+psql -U <username> -f txrep_pg.sql <databasename>
+
+Once you have created the database and added the table, just add the
+required lines to your global configuration file (local.cf).  Note that
+you must specify the proper storage backend in the config file in order
+for this to work and the current username must be passed to spamd.
+

Added: spamassassin/trunk/sql/txrep_mysql.sql
URL: http://svn.apache.org/viewvc/spamassassin/trunk/sql/txrep_mysql.sql?rev=1651114&view=auto
==============================================================================
--- spamassassin/trunk/sql/txrep_mysql.sql (added)
+++ spamassassin/trunk/sql/txrep_mysql.sql Mon Jan 12 15:17:46 2015
@@ -0,0 +1,9 @@
+CREATE TABLE txrep (
+  username varchar(100) NOT NULL default '',
+  email varchar(255) NOT NULL default '',
+  ip varchar(40) NOT NULL default '',
+  count int(11) NOT NULL default '0',
+  totscore float NOT NULL default '0',
+  signedby varchar(255) NOT NULL default '',
+  PRIMARY KEY (username,email,signedby,ip)
+) ENGINE=InnoDB;

Added: spamassassin/trunk/sql/txrep_pg.sql
URL: http://svn.apache.org/viewvc/spamassassin/trunk/sql/txrep_pg.sql?rev=1651114&view=auto
==============================================================================
--- spamassassin/trunk/sql/txrep_pg.sql (added)
+++ spamassassin/trunk/sql/txrep_pg.sql Mon Jan 12 15:17:46 2015
@@ -0,0 +1,12 @@
+CREATE TABLE txrep (
+  username varchar(100) NOT NULL default '',
+  email varchar(255) NOT NULL default '',
+  ip varchar(40) NOT NULL default '',
+  count int(11) NOT NULL default '0',
+  totscore float NOT NULL default '0',
+  signedby varchar(255) NOT NULL default '',
+  PRIMARY KEY (username,email,signedby,ip)
+);
+
+ALTER TABLE txrep SET (fillfactor=95);
+



Mime
View raw message