incubator-rat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Burrell Donkin <>
Subject Re: apache-rat-pd
Date Tue, 16 Jun 2009 21:51:52 GMT
Robert Burrell Donkin wrote:
> Marija Šljivović wrote:
>> Hi!
>> I am working on copy&paste(plagiarism) detector.
> cool
>> You  can see information about project and reports of my progress on this
>> locations:
>> or get source code and binary distributions on:
>> I think now to make some misspellings heuristic checkers. This algorithms
>> will be able to notice some misspelled words in source code.
>> Then this part of code will be sent to some of code search
>> engines(GoogleCodeSearch for example) to check if it can find any similar
>> misspellings in public code bases.
>> On that way we can check possibility if code part is plagiarised.
>> Now i search for an open source library which can be used for this task. I
>> found one: jazzy ( ) and I think that it is
>> good for this purpose.
> probably best to make the API pluggable (jazzy is LGPL but this is good
> advice in any case)
>> Any suggestion for other solution that is better then jazzy?
> i'm not sure whether it would be better but an alternative approach
> would be to use a semi-structured text analysis tool for example UIMA
> ( or lucene

for lucene, start by looking at
and then create a custom dictionary by tokenising a large number of
source files

- robert

View raw message