Return-Path: Delivered-To: apmail-james-server-dev-archive@www.apache.org Received: (qmail 46190 invoked from network); 22 Nov 2005 21:51:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 22 Nov 2005 21:51:08 -0000 Received: (qmail 13965 invoked by uid 500); 22 Nov 2005 21:51:07 -0000 Delivered-To: apmail-james-server-dev-archive@james.apache.org Received: (qmail 13709 invoked by uid 500); 22 Nov 2005 21:51:06 -0000 Mailing-List: contact server-dev-help@james.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Help: List-Post: List-Id: "James Developers List" Reply-To: "James Developers List" Delivered-To: mailing list server-dev@james.apache.org Received: (qmail 13698 invoked by uid 99); 22 Nov 2005 21:51:05 -0000 X-ASF-Spam-Status: No, hits=1.3 required=10.0 tests=SPF_FAIL X-Spam-Check-By: apache.org Received: from [192.87.106.226] (HELO ajax.apache.org) (192.87.106.226) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Nov 2005 13:51:03 -0800 Received: from ajax.apache.org (ajax.apache.org [127.0.0.1]) by ajax.apache.org (Postfix) with ESMTP id 67F76592 for ; Tue, 22 Nov 2005 22:50:42 +0100 (CET) Message-ID: <1575399289.1132696242423.JavaMail.jira@ajax.apache.org> Date: Tue, 22 Nov 2005 22:50:42 +0100 (CET) From: "Bernd Fondermann (JIRA)" To: server-dev@james.apache.org Subject: [jira] Commented: (JAMES-387) Exception in BayesianAnalysis In-Reply-To: <1301046463.1123056168850.JavaMail.jira@ajax.apache.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/JAMES-387?page=comments#action_12358307 ] Bernd Fondermann commented on JAMES-387: ---------------------------------------- I looked at the Mailet code and found that in buildCorpus(), instance variable "corpus" is filled with all ham and spam tokens which appear to be Maps of (String, Integer) pairs. Afterwards, the map is iterated and all values are replaced by Doubles, but while this is running (and taking longer every time) there could still be a fair amount of Integer-typed values. If another thread is stepping into line 591 at the same time this is still in process the error could very well occur because "corpus" is read there. Are new mails fed in a separate thread? The class cast in line 591 could be changed to "Number", as a very simple solution. Maybe it would also be appropriate to refactor buildCorpus() to work on a local map until it is ready with re-filling it with Doubles. Hope this analysis makes some sense and I did not completely misread this whole case... :-) > Exception in BayesianAnalysis > ----------------------------- > > Key: JAMES-387 > URL: http://issues.apache.org/jira/browse/JAMES-387 > Project: James > Type: Bug > Components: Matchers/Mailets (bundled) > Versions: 3.0 > Environment: James from svn-trunk 2005-08-01. > MySQL 4.0 > Reporter: Stefano Bagnara > Assignee: Vincenzo Gianferrari Pini > Priority: Minor > > Got this exception for every incoming mail: > 02/08/05 00:39:25 INFO James.Mailet: BayesianAnalysis: Exception: java.lang.Integer > java.lang.ClassCastException: java.lang.Integer > at org.apache.james.util.BayesianAnalyzer.getTokenProbabilityStrengths(BayesianAnalyzer.java:591) > at org.apache.james.util.BayesianAnalyzer.computeSpamProbability(BayesianAnalyzer.java:340) > at org.apache.james.transport.mailets.BayesianAnalysis.service(BayesianAnalysis.java:289) > at org.apache.james.transport.LinearProcessor.service(LinearProcessor.java:407) > at org.apache.james.transport.JamesSpoolManager.process(JamesSpoolManager.java:460) > at org.apache.james.transport.JamesSpoolManager.run(JamesSpoolManager.java:369) > at java.lang.Thread.run(Unknown Source) > If I clean my spam/ham db the exceptions disappears but they start again when the spam/ham db become large. > My bayesiananalysis_spam contains 200000 rows. > The following are the spam tokens with higher "occurrences". > +---------------------------+-------------+ > | token | occurrences | > +---------------------------+-------------+ > | 3D | 82151 | > | a | 59953 | > | the | 45295 | > | FONT | 42771 | > | Content-Type | 39058 | > | to | 36626 | > | com | 32902 | > | http | 32886 | > | of | 32504 | > | font | 31803 | > | and | 31577 | > | Content-Transfer-Encoding | 31576 | > | p | 29746 | > | text | 29482 | > | in | 29418 | > | it | 28498 | > | br | 28037 | > | DIV | 27431 | -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org For additional commands, e-mail: server-dev-help@james.apache.org