spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin A. McGrail" <kmcgr...@apache.org>
Subject Re: GSOC 2018 SpamAssassin Statistical Classifier Plugin
Date Tue, 20 Mar 2018 11:47:59 GMT
+users

All we give is feedback.  The submission to GSoC is what matters.  So if
you mentioned perl here that's not going to carryover to the reviewers.

Can someone with fresh eyes take a look at this?  I read it too recently so
I will gloss over it too much.

Here are some posts the mentors list thought might be helpful.  The first I
believe covers someone's pov who did not get selected.

https://medium.freecodecamp.org/hacking-gsoc-how-to-gain-real-life-experience-and-support-open-source-b1e6a664f6e4?source=linkShare-53ba2bb84284-1521381334

https://sanatt.me/2017/12/30/cracking-google-summer-code-2018/

Regards, KAM

On Tue, Mar 20, 2018, 03:57 Saahil Sirowa <cs16btech11030@iith.ac.in> wrote:

> Hi Kevin and Apache SpamAssassin Dev Community,
>
> I have resolved all the changes you suggested in the previous draft.
> 1) I mentioned about learning PERL a week before the community bonding
> period. It will not take much time. I can assure you that language is not
> going to be an issue.
> 2) I updated the biography part a bit
> 3) Significant changes have been made in the Timeline.
> 4) I'm planning to used cmake/travis ci for automated testing. If there is
> a better alternative please do suggest.
> 5) I gave links to research papers that i will be reading in the timeline.
> 6) I updated the timeline by mentioning to gain advanced information about
> email traffic and spams. I listed some links for the purpose.
> 7) I updated the credits
> 8) There are other changes made in various parts of proposal.
>
> Thanks for your previous detailed feedback.
>
> Here is link to the updated proposal
> GSoC 2018 proposal
> <https://docs.google.com/document/d/1-OCNv79sHvVViKwnrRYtlMiKWLCzz4xUW4tNOlmaTmw/edit#heading=h.q7h3lddabdvh>
> Please rigorously review it and suggest any changes that I should make.
>
> Awaiting for a favorable response.
>
>
> Thanks...
> Saahil Sirowa
> B. Tech Computer Science and Engineering
> Indian Institute of Technology, Hyderabd
>
> On Mon, Mar 19, 2018 at 3:27 AM, Kevin A. McGrail <kmcgrail@apache.org>
> wrote:
>
>> Hi Saahil
>>
>> re: Perl. As the project is primarily in Perl and you do not list that in
>> your Proficiencies or any similar languages like PHP, I would address
>> that.  The word Perl does not appear a single time.
>>
>> Your Biography is a little light on why this is something you feel you
>> can implement.  The mentors will likely NOT be able to help you with the
>> science rather focusing on the community, processes, and open source in
>> general.
>>
>> re: Email and SPam, do you have any experience with email traffic or
>> spam?  if so, add it.  If not, explain what you plan to do to address that.
>>
>> Re: Deliverables, I think you'll need to propose the first draft of
>> that.  But your goal will likely be a plugin for Apache SpamAssassin that
>> can be installed and configured to provide multiple configurable
>> statistical analysis algorithms to better identify ham (good email) and/or
>> spam (bad email)
>>
>> Please use Apache SpamAssassin to properly brand the title.
>>
>> Re: I have no input on the scheduling/timelines except that past proposal
>> I have read have included more phases and do not add "optional" items.  I'd
>> prefer to see small increments to make sure you stay on schedule and don't
>> get overwhelmed and find yourself way behind as the time progresses.
>>
>> Re: Testing Methodology, this is likely the most critical missing part.
>> I am a fan of test driven development where you set up tests that should
>> pass and fall and use continuous testing as you add code to confirm your
>> development is progressing well.
>>
>> This is especially important because spam analysis often doesn't work the
>> way people expect and tests w/statistics can help identify issues.
>>
>> For example, this is a hypothesis that this statistical algorithms will
>> be better than Bayes.  So you'll need a baseline for comparison.
>>
>> Additionally, even experts in the field are surprised when they think
>> something will prove the hamminess of an email but in fact shows the
>> opposite.  Real world example, SPF is a policy when introduced was supposed
>> to allow an automated mechanism that says "this is an email from a
>> legitimate mail server for my domain".
>>
>> However, the FIRST wave of people to adobt it were all spammers.  So it
>> became a spam indicator more than a spam indicator.  It was a very
>> interesting outcome.
>>
>> Re: Corpora, you'll want a corpora of carefully hand sorted ham and
>> spam.  Have you thought about how you'll get that?  I *might* be able to
>> help but it's 50/50.
>>
>> Re: You mention reading research papers on statisical algorithms from a
>> previous proposal.  You'll want to list them to show which ones you plan to
>> study
>>
>> re: "Discussions with the SA community regarding the various types of
>> spams that the present SA can handle." is unclear.  What is a "type of
>> spam" to you?  Do you have a list of types of spam?
>>
>> re: "Brainstorming with the mentors and SA community about the various
>> input features and parameters that can have a huge impact on the overall
>> performance of the listed neural nets models." I think this is flawed.
>> There won't be a ton of people who can discuss this with you.  You'll need
>> to likely use scientific process to show what has a performance impact.
>> This is not busy work or school work.  This is an experiment that has not
>> been tried at the SA project.
>>
>> re: "actively involved with the community." is a stretch.  A few emails
>> do not active involvement make.
>>
>> re: Bonding, you might consider raising that to 1-2 major bugs and 10-20
>> minor bugs.
>>
>> Re: Credits/references, I would add more clarity about where each of
>> those references are used.
>>
>> Regards,
>> KAM
>>
>
>

Mime
View raw message