incubator-rat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Burrell Donkin <robertburrelldon...@blueyonder.co.uk>
Subject Re: apache-rat-pd
Date Tue, 16 Jun 2009 21:15:23 GMT
Marija Šljivović wrote:
> Hi!
> I am working on copy&paste(plagiarism) detector.

cool

> You  can see information about project and reports of my progress on this
> locations:
> http://wiki.apache.org/general/MarijaSljivovic/SoC2009ApacheRatProposal
> https://issues.apache.org/jira/browse/RAT-45
> or get source code and binary distributions on:
> http://code.google.com/p/apache-rat-pd/
> I think now to make some misspellings heuristic checkers. This algorithms
> will be able to notice some misspelled words in source code.
> Then this part of code will be sent to some of code search
> engines(GoogleCodeSearch for example) to check if it can find any similar
> misspellings in public code bases.
> On that way we can check possibility if code part is plagiarised.
> Now i search for an open source library which can be used for this task. I
> found one: jazzy ( http://jazzy.sourceforge.net/ ) and I think that it is
> good for this purpose.

probably best to make the API pluggable (jazzy is LGPL but this is good
advice in any case)

> Any suggestion for other solution that is better then jazzy?

i'm not sure whether it would be better but an alternative approach
would be to use a semi-structured text analysis tool for example UIMA
(http://incubator.apache.org/uima/) or lucene

> Work on apache-rat-pd(plagiarism detector) is continuing. 

great :-)

- robert


Mime
View raw message