creadur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Burrell Donkin <>
Subject Re: [GSOC] Rat: Past, Present and Future
Date Thu, 11 Jul 2013 19:49:05 GMT
On 07/10/13 23:49, Manuel Suárez Sánchez wrote:
>> 1. scan the source, building a strongly-typed, immutable domain model
> This point is basic to improve the project because now there aren´t a good
> domain model and it´s very confused.

I think that the question comes down to granularity.

Here's one way that the two contrasting approach might work...

With the full model approach, the source would be scanned completed into 
a model before the document contents were analysed. Once the analysis 
was complete, then the reporting would start. The process flow would be 
course-grained. This would cut across the grain of the current Rat design.

With a message oriented architecture, the scanner would send each 
document to enrichment as soon as it was created. The enricher would 
take a look at the contents and add document-level meta-data, then pass 
on the enriched object as soon as it was created. Aggregate analysers 
would then build up the report. This would be sympathetic to the current 
Rat design.

Retaining a streaming/messaging architecture means modelling at the 
message level (rather than more complete structures)


> However, I think that the current streaming design isn't particularly
>> intuitive or obvious. I would be happy to retain an improved streaming
>> design.
> I think that apache rat is a release audit tool, focused on licenses. In
> the project you analyse a file(audio) and you get the license of the file. Why
> do you try to use streaming/message driven architecture?

Performance at small memory footprint


View raw message