incubator-rat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Burrell Donkin <>
Subject Re: apache-rat-pd
Date Tue, 16 Jun 2009 21:15:23 GMT
Marija Šljivović wrote:
> Hi!
> I am working on copy&paste(plagiarism) detector.


> You  can see information about project and reports of my progress on this
> locations:
> or get source code and binary distributions on:
> I think now to make some misspellings heuristic checkers. This algorithms
> will be able to notice some misspelled words in source code.
> Then this part of code will be sent to some of code search
> engines(GoogleCodeSearch for example) to check if it can find any similar
> misspellings in public code bases.
> On that way we can check possibility if code part is plagiarised.
> Now i search for an open source library which can be used for this task. I
> found one: jazzy ( ) and I think that it is
> good for this purpose.

probably best to make the API pluggable (jazzy is LGPL but this is good
advice in any case)

> Any suggestion for other solution that is better then jazzy?

i'm not sure whether it would be better but an alternative approach
would be to use a semi-structured text analysis tool for example UIMA
( or lucene

> Work on apache-rat-pd(plagiarism detector) is continuing. 

great :-)

- robert

View raw message