Mailing-List: contact dev-help@cocoon.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cocoon.apache.org
Message-ID: <3F293A74.9070007@apache.org>
Date: Thu, 31 Jul 2003 17:49:08 +0200
From: Gianugo Rabellino <gianugo@apache.org>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030312
MIME-Version: 1.0
To: dev@cocoon.apache.org
Subject: Re: Flow's processPipelineTo() and FileSource
References: <3F292924.10908@apache.org> <3F2937E3.3040408@anyware-tech.com>
In-Reply-To: <3F2937E3.3040408@anyware-tech.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Sylvain Wallez wrote:

>> I'm having a hell of a time using flow with processPipelineTo() and 
>> OutputStreams coming out from FileSource(s).
>>
>> The problem is that FileSource#getOutputStream() creates a temporary 
>> file (... to be discussed later ...) and such file gets renamed to the 
>> original one only upon OutputStream.close(). Now, AbstractInterpreter, 
>> line 201, actually calls flush() but *never* close. As a result, 
>> everything is kinda ... well... screwed up.
>>
>> Patch is trivial, but I'm wondering if adding out.close() in 
>> AbstractInterpreter.java might break something: any flow experts around? 
> 
> I don't see why there should be some consequences on the flow itself... 
> Just replace flush() by close() !

Just did it, but I didn't replace flush(), just added close() 
afterwards: it's better to be sure that there are no leftovers...

>> Now for the FileSource: I do understand *some* of the reasoning behind 
>> using a temporary file, but I have to disagree on the implementation: 
>> naming it [filename].tmp is a bit of a bet, since someone might 
>> legitimately have such a filename around. While I understand that 
>> there might be memory issues with large files, I guess that either:
>>
>> 1. keeping a ByteArrayOutputStream;
>> 2. forget about it and just write the file;
>> 3. use a more "clever" name that doesn't risk conflicts this much 
> 
> 
> 
> I would avoid 2. The reason why I used a temporary file is because of 
> the streamed nature of Cocoon pipelines. If an error occurs within the 
> processing, the original content is not partially overwritten. My 
> preference would go to 3.

I see and understand. Yet temporary files, besides being somehow 
inconvenient, can be a major security hole in general. I'd rather go for 
1, then, accumulating bytes as they come on a ByteArrayOutputStream and 
writing them upon close() (and maybe flush() too?). True, this is in 
turn a possible security hole since someone might DOS the machine by 
processing gigabyte-sized files, but all in all I tend to think that 
it's a better solution... and yes, doing transaction on a filesystem is 
a PITA. :-)

Ciao,

>> are all better options.
>>
>> Is that OK to you if I work on it? I don't know if I have access to 
>> the Excalibur CVS though... 
> 
> As a Cocoon committer, you should.

I understand that I am authorized in line of principle, just don't know 
if I need to be explicitely enabled. Anyway, I'll check it out. :-)

Thanks for everything,

-- 
Gianugo Rabellino
Pro-netics s.r.l. -  http://www.pro-netics.com
Orixo, the XML business alliance - http://www.orixo.com
     (Now blogging at: http://blogs.cocoondev.org/gianugo/)