Return-Path: X-Original-To: apmail-creadur-dev-archive@www.apache.org Delivered-To: apmail-creadur-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 58E7410894 for ; Sun, 8 Sep 2013 19:01:15 +0000 (UTC) Received: (qmail 17852 invoked by uid 500); 8 Sep 2013 19:01:14 -0000 Delivered-To: apmail-creadur-dev-archive@creadur.apache.org Received: (qmail 17812 invoked by uid 500); 8 Sep 2013 19:01:11 -0000 Mailing-List: contact dev-help@creadur.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@creadur.apache.org Delivered-To: mailing list dev@creadur.apache.org Received: (qmail 17799 invoked by uid 99); 8 Sep 2013 19:01:09 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 08 Sep 2013 19:01:09 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ssm767@gmail.com designates 209.85.220.175 as permitted sender) Received: from [209.85.220.175] (HELO mail-vc0-f175.google.com) (209.85.220.175) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 08 Sep 2013 19:01:00 +0000 Received: by mail-vc0-f175.google.com with SMTP id ia10so3330914vcb.6 for ; Sun, 08 Sep 2013 12:00:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=baGl9Q0sRm4WmSPI9PqGffTeF8gpiqPC8g6yszGbNmI=; b=PnXhu3vBXba19g34GgJNT4A8j4iXTJnVigENqzqdnMD8sipPnhlMcTbjl0LLnf7SAL FM2L+i2CS0sqUm7h+LUWqLXbD0fKoykAz3hvPMdeTQmHFwNEUtq/LA4I0UYtPcpv5k1P xs3EE2CEil+a1F9kuYidUfxSFkZEDuhk5rHqvRrt/MJocea4yu9/8m+s0flsm0iljvuS tW2eXkxtK2KqHdPnxppE1ymt7j0+Au9w51HAi83S/UD9iPvbCryO5IptTIsUH2PjFKBh XiD0HIm8NI0DA8/PggN+flw8tyswgSI2mRUPU0lPwWOD9hIM56YV9J6bMoQwGYaWMx3A qE5Q== X-Received: by 10.52.120.78 with SMTP id la14mr11712934vdb.9.1378666838880; Sun, 08 Sep 2013 12:00:38 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.74.209 with HTTP; Sun, 8 Sep 2013 12:00:17 -0700 (PDT) In-Reply-To: <51DF0C31.10107@blueyonder.co.uk> References: <51DB1725.4050301@blueyonder.co.uk> <51DC64C3.6070300@blueyonder.co.uk> <51DF0C31.10107@blueyonder.co.uk> From: =?UTF-8?Q?Manuel_Su=C3=A1rez_S=C3=A1nchez?= Date: Sun, 8 Sep 2013 21:00:17 +0200 Message-ID: Subject: Re: [GSOC] Rat: Past, Present and Future To: dev@creadur.apache.org Content-Type: multipart/related; boundary=e89a8f23463532065104e5e3e37e X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f23463532065104e5e3e37e Content-Type: multipart/alternative; boundary=e89a8f23463532064804e5e3e37d --e89a8f23463532064804e5e3e37d Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Everyone. Two months ago more less this topic was created at that time I was new in the project and I didn=C2=B4t know a lot of things about it but with the pa= st of time I was working in the project and I was learning more about it. My objective is try to do this task: https://issues.apache.org/jira/browse/RAT-131 , I think that I made a lot of changes, improvements and punish bad code in the project. My fork of the project is here: https://github.com/elnuma/creadur-rat/tree/gsoc . This is open source project I would like that community review it and I would like to receive a FeedBack(I know that I=C2=B4m new in this world so I can do go= od and bad things for me the most important is learning about the mistakes). Apache-Rat-Core: Before: After: Coverage 75% 96% [image: Im=C3=A1genes integradas 1] Changes of Refactor: -Deleted not used Vars, Class, Method. -Change bad used of Java. -Improved performance. -Add Test class, Test Methods -Apply PMD Changes. -Format Code. -Add JavaDoc. I have still Two weeks to work in the project in the Timeline of GSOC, In this time I would like to improve the project for this reason I would like to work in one task( I need that all the community together try to find the weakness point of the project), all this time I was working alone because I thought that I don=C2=B4t have time to finish it but I understand that it i= s open source and we need to work together. The community was made growing up this project and it=C2=B4s the great of Open Source Project. Manuel. 2013/7/11 Robert Burrell Donkin > On 07/10/13 23:49, Manuel Su=C3=A1rez S=C3=A1nchez wrote: > >> >>> 1. scan the source, building a strongly-typed, immutable domain model >>> >> >> >> This point is basic to improve the project because now there aren=C2=B4t= a good >> domain model and it=C2=B4s very confused. >> > > I think that the question comes down to granularity. > > Here's one way that the two contrasting approach might work... > > With the full model approach, the source would be scanned completed into = a > model before the document contents were analysed. Once the analysis was > complete, then the reporting would start. The process flow would be > course-grained. This would cut across the grain of the current Rat design= . > > With a message oriented architecture, the scanner would send each documen= t > to enrichment as soon as it was created. The enricher would take a look a= t > the contents and add document-level meta-data, then pass on the enriched > object as soon as it was created. Aggregate analysers would then build up > the report. This would be sympathetic to the current Rat design. > > Retaining a streaming/messaging architecture means modelling at the > message level (rather than more complete structures) > > > > > However, I think that the current streaming design isn't particularly >> >>> intuitive or obvious. I would be happy to retain an improved streaming >>> design. >>> >> >> >> I think that apache rat is a release audit tool, focused on licenses. In >> the project you analyse a file(audio) and you get the license of the >> file. Why >> do you try to use streaming/message driven architecture? >> > > Performance at small memory footprint > > Robert > --e89a8f23463532064804e5e3e37d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Everyone.

Two months ago more less t= his topic was created at that time I was new in the project and I didn=C2= =B4t know a lot of things about it but with the past of time I was working = in the project and I was learning more about it.

My objective is try to do this task:=C2=A0https://issues.apache.org/jira= /browse/RAT-131 , I think that I made a lot of changes, improvements an= d punish bad code in the project. My fork of the project is here:=C2=A0https://github.com/= elnuma/creadur-rat/tree/gsoc=C2=A0. This is open source project I would= like that community review it and I would like to receive a FeedBack(I kno= w that I=C2=B4m new in this world so I can do good and bad things for me th= e most important is learning about the mistakes).
Apache-Rat-Core:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0Before: =C2=A0 =C2=A0After:
Coverage =C2=A0 = =C2=A0 =C2=A0 75% =C2=A0 =C2=A0 =C2=A096%
3D"Im=C3=A1genes
Changes of Refactor:

-Deleted not used Vars, Class, Method.
-Chang= e bad used of Java.
-Improved performance.
-Add Test cl= ass, Test Methods
-Apply PMD Changes.
-Format Code.
-Add JavaDoc.

I have still Two weeks to work = in the project in the Timeline of GSOC, In this time I would like to improv= e the project for this reason I would like to work in one task( I need that= all the community together try to find the weakness point of the project),= all this time I was working alone because I thought that I don=C2=B4t have= time to finish it but I understand that it is open source =C2=A0and we nee= d to work together. The community was made growing up this project and it= =C2=B4s the great of Open Source Project.

Manuel.

2013/7/11 Robert Burrell Donkin <robertburrelldonkin@blueyonder.co.uk>
On 07/10/13 23:49, Manuel = Su=C3=A1rez S=C3=A1nchez wrote:

1. scan the source, building a strongly-typed, immutable domain model


This point is basic to improve the project because now there aren=C2=B4t a = good
domain model and it=C2=B4s very confused.

I think that the question comes down to granularity.

Here's one way that the two contrasting approach might work...

With the full model approach, the source would be scanned completed into a = model before the document contents were analysed. Once the analysis was com= plete, then the reporting would start. The process flow would be course-gra= ined. This would cut across the grain of the current Rat design.

With a message oriented architecture, the scanner would send each document = to enrichment as soon as it was created. The enricher would take a look at = the contents and add document-level meta-data, then pass on the enriched ob= ject as soon as it was created. Aggregate analysers would then build up the= report. This would be sympathetic to the current Rat design.

Retaining a streaming/messaging architecture means modelling at the message= level (rather than more complete structures)

<snip>


However, I think that the current streaming design isn't particularly
intuitive or obvious. I would be happy to retain an improved streaming
design.


I think that apache rat is a release audit tool, focused on licenses. In the project you analyse a file(audio) and you get the license of the file. = Why
do you try to use streaming/message driven architecture?

Performance at small memory footprint

Robert

--e89a8f23463532064804e5e3e37d-- --e89a8f23463532065104e5e3e37e--