incubator-rat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexei Fedotov (JIRA)" <j...@apache.org>
Subject [jira] Commented: (RAT-45) Apache RAT copy&paste detector - tool for detecting copied(plagiarised) code by searching on web code search engines
Date Tue, 21 Apr 2009 08:46:47 GMT

    [ https://issues.apache.org/jira/browse/RAT-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701108#action_12701108
] 

Alexei Fedotov commented on RAT-45:
-----------------------------------

When designing search engine interface, please take a look on the Google Code Search API.
They use compatible APL 2.0, so we at least can cast a glance. It would be nice if we can
reuse some of their code. The ground rule for reuse is to avoid mixing Google code with ours.

http://code.google.com/intl/en/apis/codesearch/
http://gdata-java-client.googlecode.com/files/gdata-src.java-1.26.0.java.zip

> Apache RAT copy&paste detector - tool for detecting copied(plagiarised) code by searching
on web code search engines
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: RAT-45
>                 URL: https://issues.apache.org/jira/browse/RAT-45
>             Project: RAT
>          Issue Type: New Feature
>         Environment: This improvements of Apache RAT tool will be written in Java.
> Requirements: OS with RE already installed on  and Internet connection
>            Reporter: Marija Sljivovic
>         Attachments: copyandpaste.zip, copyandpastedetector-src-0.01.zip
>
>   Original Estimate: 2688h
>  Remaining Estimate: 2688h
>
> This document is about implementing new tool which will be included in Apache RAT project.
> Original idea: http://wiki.apache.org/general/SummerOfCode2009#rat-project
> Aim is to create working, modular, configurable command-line tool
> for searching the web based code search  engines for possible plagiarised code in our
code bases.
> Tool will be heuristic in nature. It will make guesses about code parts.
> If it decide that code is good-to-be-copy&pasted, it will check if there is matching
code on code search engines.
> This part of code will be stored in report if any  match is found.
> Man who read this report will decide about is code really copied or it is not.
> Algorithm which will be in base of this tool is variant of sliding-window algorithm.
> Current code parts which algorithm generate will be checked by different heuristic methods
and optionally
> will be sent to some code search engine for checking.
> More information and ideas about this project can be found here:
> http://wiki.apache.org/general/MarijaSljivovic/SoC2009ApacheRatProposal

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message