incubator-jspwiki-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Janne Jalkanen <Janne.Jalka...@ecyrd.com>
Subject Re: Spam package redesign
Date Sun, 27 Sep 2009 20:12:04 GMT

Heya!

I don't quite understand - why would a floating point value be any  
more complicated than an integer?

Note that at the moment we mark a modification a spam if a single test  
fails, which essentially means doing logical OR more than any sort of  
an addition (=threshold is always 1).  The integer is there more or  
less as something-which-was-never-completed-and-I-am-almost-sure-is- 
the-wrong-design. In fact, I did lament the fact that you can't say  
that "X is a bit more efficient than Y" - with an integer-based design  
you essentially go for fixed point arithmetics (test A gives you 100,  
test B gives you 150, etc, where you just shifted decimal points to  
the right just for the heck of it).

If you take a look at SpamAssassin, it uses floats.

I'd hesitate changing it later, since it will create an API contract.

/Janne

On Sep 27, 2009, at 22:34 , Andrew Jaquith wrote:

> Janne --- good critique, and good suggestions. I've killed the limit- 
> setting methods on the Inspection class itself in favor of a  
> listener strategy. I adopteed your other suggestions too, except...
>
> ...at the moment I am inclined to leave the scoring scale as integer- 
> based for now. I like it because it is simple, and resembles what we  
> do now. We can change it later if need be.
>
> In answer to your first question: the inspector package does not  
> help graduation, although it does help the beta that comes after the  
> diploma. :)
>
> Andrew
>
> On Sep 27, 2009, at 4:37, Janne Jalkanen <janne.jalkanen@ecyrd.com>  
> wrote:
>
>>
>> My main concern: does this advance our graduation?
>>
>> Others inline.
>>
>> On 25 Sep 2009, at 20:54, Andrew Jaquith wrote:
>>
>>> The "inspect" package provides facilities that allow content to be
>>> inspected and scored based on various criteria such as whether a  
>>> wiki
>>> page change contains spam. Content may be scored by initializing an
>>> Inspection object and then calling the {@link Inspection#inspect()}
>>> method. The {@code inspect} method, in turn, iterates through a
>>> variety of previously-configured {@link Inspector} objects and calls
>>> the {@link Inspector#inspect(String,String)} method for each one.  
>>> Each
>>> of the configured inspectors is free to perform whatever evaluations
>>> it wishes, and can increase, decrease or reset the "scores" for any
>>> score category, for example the {@link Score#SPAM} category.
>>
>> Ok.
>>
>>> * Score objects supply instructions to the parent Inspection  
>>> object to
>>> increment, decrement or reset the score for a particular category.
>>> Each Score object is constructed with a category (for example,
>>> Score.SPAM), an integer indicating how much to change the score, and
>>> an optional String message that provides context for the change. For
>>> example, a Score that increments the spam score by 1 could be
>>> constructed by new Score( Score.SPAM, 1, "Bot detected." ). Negative
>>> numbers can be supplied also to decrease the score. For convenience,
>>> {@link Score#INCREMENT_SCORE} means "add 1", {@link
>>> Score#DECREMENT_SCORE} means "subtract 1", and {@link Score#RESET}
>>> means "reset to zero."
>>
>> Smells of overdesign to me.  Creating a new operation and  
>> methodology to support addition sounds like a bad idea. Not to  
>> mention the fact that you might actually want to use fractions.
>>
>> Something like inspection.changeScore( Score.SPAM, 0.5f ) sounds  
>> more appealing to me.
>>
>>> public String preSave( WikiContext context, String content ) throws
>>> RedirectException
>>> {
>>> Change change = getChange( context, content );
>>>
>>> // Run the Inspection
>>> Inspection inspection = new Inspection( context, m_config,  
>>> m_inspectors );
>>> inspection.addLimit( Score.SPAM, m_scoreLimit );
>>> inspection.inspect( content, change );
>>> int spamScore = inspection.getScore( Score.SPAM );
>>> context.setVariable( ATTR_SPAMFILTER_SCORE, spamScore );
>>>
>>> // Redirect user if score too high
>>> if( inspection.isFailed() )
>>> {
>>>  ...
>>> }
>>> ...
>>> }
>>
>> Yes, this looks unnecessarily complicated and limiting to me.  I  
>> would remove automatic limit-rating altogether, and would just  
>> concentrate on refactoring the individual methods from SpamFilter  
>> into the inspect-package with a light wrapper around them.  I think  
>> a callback design is better than adding quite limited limit  
>> thingies anyway, as this gives full programmatic control to the  
>> developer without us having to pre-guess the possible limitations.
>>
>> E.g. inspection.inspect( content, change, new  
>> InspectionResultListener() {
>>   public boolean visit(Inspection ins)
>>   {
>>        // Any possible decision code goes here
>>        return ins.getScore() > m_maxScore;
>>   }
>> });
>>
>> Obviously, these listeners would be optional, and you could just  
>> let the entire chain run without interference. And for simplicity,  
>> we can provide some default listeners, e.g.
>>
>> inspection.inspect( content, change, new  
>> StopAtScoreListener( Score.SPAM, 5 ) );
>>
>> (Yeah, I'm a big fan of this pattern 'cos it gives much more  
>> control to the developer and eases debugging immensely.)
>>
>>> ChangeRateInspector, LinkCountInspector, SpamInspector (for  
>>> Akismet),
>>
>> This should obviously be called AkismetInspector; all the others  
>> detect spammers too (and Akismet is probably the worst of the lot  
>> anyway).
>>
>> /Janne


Mime
View raw message