james-server-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bernd Fondermann (JIRA)" <server-...@james.apache.org>
Subject [jira] Commented: (JAMES-387) Exception in BayesianAnalysis
Date Tue, 22 Nov 2005 21:50:42 GMT
    [ http://issues.apache.org/jira/browse/JAMES-387?page=comments#action_12358307 ] 

Bernd Fondermann commented on JAMES-387:
----------------------------------------

I looked at the Mailet code and found that in buildCorpus(), instance variable "corpus" is
filled with all ham and spam tokens which appear to be Maps of (String, Integer) pairs. Afterwards,
the map is iterated and all values are replaced by Doubles, but while this is running (and
taking longer every time) there could still be a fair amount of Integer-typed values.
If  another thread is stepping into line 591 at the same time this is still in process the
error could very well occur because "corpus" is read there.
Are new mails fed in a separate thread?

The class cast in line 591 could be changed to "Number", as a very simple solution. Maybe
it would also be appropriate to refactor buildCorpus() to work on a local map until it is
ready with re-filling it with Doubles.

Hope this analysis makes some sense and I did not completely misread this whole case... :-)



> Exception in BayesianAnalysis
> -----------------------------
>
>          Key: JAMES-387
>          URL: http://issues.apache.org/jira/browse/JAMES-387
>      Project: James
>         Type: Bug
>   Components: Matchers/Mailets (bundled)
>     Versions: 3.0
>  Environment: James from svn-trunk 2005-08-01.
> MySQL 4.0
>     Reporter: Stefano Bagnara
>     Assignee: Vincenzo Gianferrari Pini
>     Priority: Minor

>
> Got this exception for every incoming mail:
> 02/08/05 00:39:25 INFO  James.Mailet: BayesianAnalysis: Exception: java.lang.Integer
> java.lang.ClassCastException: java.lang.Integer
>         at org.apache.james.util.BayesianAnalyzer.getTokenProbabilityStrengths(BayesianAnalyzer.java:591)
>         at org.apache.james.util.BayesianAnalyzer.computeSpamProbability(BayesianAnalyzer.java:340)
>         at org.apache.james.transport.mailets.BayesianAnalysis.service(BayesianAnalysis.java:289)
>         at org.apache.james.transport.LinearProcessor.service(LinearProcessor.java:407)
>         at org.apache.james.transport.JamesSpoolManager.process(JamesSpoolManager.java:460)
>         at org.apache.james.transport.JamesSpoolManager.run(JamesSpoolManager.java:369)
>         at java.lang.Thread.run(Unknown Source)
> If I clean my spam/ham db the exceptions disappears but they start again when the spam/ham
db become large.
> My bayesiananalysis_spam contains 200000 rows.
> The following are the spam tokens with higher "occurrences".
> +---------------------------+-------------+
> | token                     | occurrences |
> +---------------------------+-------------+
> | 3D                        |       82151 |
> | a                         |       59953 |
> | the                       |       45295 |
> | FONT                      |       42771 |
> | Content-Type              |       39058 |
> | to                        |       36626 |
> | com                       |       32902 |
> | http                      |       32886 |
> | of                        |       32504 |
> | font                      |       31803 |
> | and                       |       31577 |
> | Content-Transfer-Encoding |       31576 |
> | p                         |       29746 |
> | text                      |       29482 |
> | in                        |       29418 |
> | it                        |       28498 |
> | br                        |       28037 |
> | DIV                       |       27431 |

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Mime
View raw message