Return-Path: Delivered-To: apmail-ws-axis-c-dev-archive@www.apache.org Received: (qmail 22812 invoked from network); 15 Mar 2008 10:33:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Mar 2008 10:33:58 -0000 Received: (qmail 6094 invoked by uid 500); 15 Mar 2008 10:33:55 -0000 Delivered-To: apmail-ws-axis-c-dev-archive@ws.apache.org Received: (qmail 6076 invoked by uid 500); 15 Mar 2008 10:33:55 -0000 Mailing-List: contact axis-c-dev-help@ws.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: "Apache AXIS C Developers List" Reply-To: "Apache AXIS C Developers List" Delivered-To: mailing list axis-c-dev@ws.apache.org Received: (qmail 6065 invoked by uid 99); 15 Mar 2008 10:33:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Mar 2008 03:33:55 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.68.3.85] (HELO wbm4.pair.net) (209.68.3.85) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Mar 2008 10:33:06 +0000 Received: by wbm4.pair.net (Postfix, from userid 65534) id A6F1AA996; Sat, 15 Mar 2008 06:33:25 -0400 (EDT) Received: from 124.43.206.110 ([124.43.206.110]) (SquirrelMail authenticated user senaka@wso2.com) by webmail4.pair.com with HTTP; Sat, 15 Mar 2008 16:03:25 +0530 (IST) Message-ID: <52182.124.43.206.110.1205577205.squirrel@webmail4.pair.com> In-Reply-To: <1205572232.6089.27.camel@manjula> References: <1205408937.6326.28.camel@manjula> <1205519698.6192.24.camel@manjula> <45570.124.43.225.194.1205531105.squirrel@webmail4.pair.com> <47DB281C.1070504@wso2.com> <43223.124.43.206.110.1205567450.squirrel@webmail4.pair.com> <36340.124.43.206.110.1205568570.squirrel@webmail4.pair.com> <1205572232.6089.27.camel@manjula> Date: Sat, 15 Mar 2008 16:03:25 +0530 (IST) Subject: Re: Caching support for large attachments From: "Senaka Fernando" To: axis-c-dev@ws.apache.org Reply-To: senaka@wso2.com User-Agent: SquirrelMail/1.4.5 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal X-Virus-Checked: Checked by ClamAV on apache.org Hi Manjula, Please read my reply inline. > Hi Senaka, > > I am confused here. I think you are taking the discussion to the > beginning. Because in the receiving side we read till the end of the > stream. Please see my first mail. No I'm not taking the discussion to the starting point. I'm rather proposing an alternative implementation. According to what I mention here, we will rather still read till the end of the stream. But, we will not buffer everything we read into memory. We will flush the buffer to a file once it exceeds a threshold. However, when we read beyond the buffer size, we will not directly copy the entire content to file without parsing it. Instead we will use our fixed-sized buffer to temporarily store the content before being flushed and then parse it and write it to file. Thus, the file will contain only the binary part. It will not contain the "--MIMEBoundary" statements etc. These, along with the file name(s) can be stored into the parsed attachment object created. Thus, the memory consumption will be limited to the size of the fixed buffer and we will use the file for storage. This mechanism gives us the added plus of not having to worry about re-parsing what is written to file as it has already being parsed once. Please note that MIME parsing DOES NOT require us to store the entire content in memory. > > When sending writing part by part to the stream is same as chunking. > Because when sending either you should specify a content-length or > specified it as chunked. No, it is not the same as chunking. What I meant here is that you need not read the entire content at once to memory and write to the stream in a single step. Rather we can read part by part and write it to the stream and repeat the process until the whole large file is written. In here you will still be using the Content Length. Chunking is a whole different story where you can transmit data as blocks. Using chunking we can send an arbitrary length of data of which the length is not pre-calculated. Now you might wonder how do we calculate the content-length without reading the entire content to the memory. Well, you can seek through the file and find out the size of the content to be written. Add to it the standard header block and MIME boundary demarcation string lengths and you will get the Content Length. This is a not at all expensive operation as the file seek will be scanning the file as a block without reading it to memory. The OS will manage it's efficiency. > > -Manjula. Regards, Senaka > > On Sat, 2008-03-15 at 13:39 +0530, Senaka Fernando wrote: >> >>> BTW, this whole discussion is about in path, that is reading an >> >>> incomming message. How about the out path? We have the same >> problems >> >>> when sending attachments. Right now, we read the whole file into >> >>> memory >> >>> and then only we send over the wire. >> >> hmm... Why not write it in chunks.. Read a chunk from the file, then >> >> write it to the outstream.. Use size of the file for content-type >> >> calculation in case of non-chunking.. But mostly people will use >> >> chunking when using MTOM.. >> > >> > No, chunking is not required. You also don't need to write the entire >> data >> > to be sent, to the stream at once. Because any HTTP Receiver will pull >> > from the stream until it sees a valid ending character sequence. >> >> It should rather read a length equal to content length. And the >> terminating sequence is for headers. Sorry for the confusion. Therefore, >> the HTTP Receiver will pull from the stream until it reads a content >> length or until an error occurs. >> >> > >> > I believe that you should be able to write part by part to the stream, >> and >> > send it, then reuse the buffer and write part 2, and send and so on. >> This >> > argument can be justified, because on the receiving end, we must read >> the >> > multi-part data until we encounter the mime boundary, unlike an >> ordinary >> > payload where it can be terminated by a valid terminating character >> >> Same here. We'll be reading a length equal to content length. >> >> > sequence . We'll only have issues if we are to write large soap >> payloads >> > which of course can be dealt with once we've implemented Session in >> > Axis2/C. >> > >> > Regards, >> > Senaka >> > >> >> >> >> thanks, >> >> Thilina >> >> >> >> >> >>> >> >>> Samisa... >> >>> >> >>> >> >>> >> >>> > Regards, >> >>> > Senaka >> >>> > >> >>> > >> >>> >> Hi, >> >>> >> >> >>> >>> > In Axis2/Java case we do write the attachment content >> directly >> >>> from >> >>> >>> > the InputStream to the File when the attachment size is >> larger >> >>> than >> >>> >>> > the threshold. This avoids loading the whole attachment to >> the >> >>> >>> memory >> >>> >>> > at all. >> >>> >>> >> >>> >>> In this case to find out the attachment size don't you need to >> do >> >>> any >> >>> >>> mime parsing? How do you find the attachment size with out >> >>> searching >> >>> >>> for >> >>> >>> the mime boundaries ? >> >>> >>> >> >>> >> Yes.. MIME is a boundary based packaging mechanism and you does >> not >> >>> >> need to specify the length for each of the parts...Even the HTTP >> >>> >> content length is not there if the message is chunked. >> >>> >> >> >>> >> What we did in Axis2/Java to overcome this is to read the data >> to a >> >>> >> byte[] buffer of up to a certain size (the size threshold). If >> >>> there >> >>> >> are more data available in the mime part (if we have not >> >>> encountered >> >>> >> the boundary yet) then we know this attachment is bigger than >> the >> >>> >> threshold. So we create the temp file, pump the content in the >> >>> buffer >> >>> >> to the file, then pump the rest of the stream to the file.. In >> this >> >>> >> way we do not need to know the size of the attachment upfront.. >> BTW >> >>> we >> >>> >> do all of the above while we are parsing the MIME message at the >> >>> MIME >> >>> >> parser level.. >> >>> >> >> >>> >> >> >>> >>> > This has the plus point that the attachment size will be >> >>> >>> > limited only by the available free space in the Temp >> >>> Directory.. >> >>> >>> > Will that be possible in Axis2/C.. Or is that wat you have >> in >> >>> mind >> >>> >>> :).. >> >>> >>> >> >>> >>> Yes this is possible. >> >>> >>> >> >>> >> But in Axis2/JAVA we will get a OutOfMemory if we parse a large >> >>> MIME >> >>> >> part upfront, since it reads the attachment to memory. May be >> you >> >>> can >> >>> >> have a larger limit with C than in Java, but ultimately you'll >> come >> >>> to >> >>> >> a situation where you will not have enough memory to store that >> >>> MIME >> >>> >> part in memory in the parsing time, unless you write in to a >> File >> >>> >> while parsing,.. >> >>> >> >> >>> >> thanks, >> >>> >> Thilina >> >>> >> >> >>> >> >> >>> >>> >> >>> >>> > >> >>> >>> > thanks, >> >>> >>> > Thilina >> >>> >>> > >> >>> >>> > >and keeping the file name inside >> >>> >>> > > data_handler instead of the whole buffer. So the service >> or >> >>> the >> >>> >>> client >> >>> >>> > > will get the file name instead of the buffered stream, >> when >> >>> it >> >>> >>> receives >> >>> >>> > > an attachment. This will not prevent buffering the >> >>> attachment >> >>> at >> >>> >>> the >> >>> >>> > > transport but will prevent keeping it inside the om_tree >> >>> till >> >>> it >> >>> >>> reaches >> >>> >>> > > the receiver. >> >>> >>> > > >> >>> >>> > > Before implementing this I would like to know your >> >>> suggestions >> >>> >>> regarding >> >>> >>> > > this. >> >>> >>> > > >> >>> >>> > > [1] https://issues.apache.org/jira/browse/AXIS2C-672 >> >>> >>> > > >> >>> >>> > > Thanks, >> >>> >>> > > -Manjula >> >>> >>> > > >> >>> >>> > > -- >> >>> >>> > > Manjula Peiris: http://manjula-peiris.blogspot.com/ >> >>> >>> > > >> >>> >>> > > >> >>> >>> > > --------------------------------------------------------------------- >> >>> >>> > > To unsubscribe, e-mail: >> axis-c-dev-unsubscribe@ws.apache.org >> >>> >>> > > For additional commands, e-mail: >> >>> axis-c-dev-help@ws.apache.org >> >>> >>> > > >> >>> >>> > > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> >> >>> >>> >> >>> >>> --------------------------------------------------------------------- >> >>> >>> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org >> >>> >>> For additional commands, e-mail: axis-c-dev-help@ws.apache.org >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >> >> >>> >> -- >> >>> >> Thilina Gunarathne - http://thilinag.blogspot.com >> >>> >> >> >>> >> --------------------------------------------------------------------- >> >>> >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org >> >>> >> For additional commands, e-mail: axis-c-dev-help@ws.apache.org >> >>> >> >> >>> >> >> >>> >> >> >>> > >> >>> > >> >>> > --------------------------------------------------------------------- >> >>> > To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org >> >>> > For additional commands, e-mail: axis-c-dev-help@ws.apache.org >> >>> > >> >>> > >> >>> > >> >>> > >> >>> >> >>> >> >>> -- >> >>> Samisa Abeysinghe >> >>> Software Architect; WSO2 Inc. >> >>> >> >>> http://www.wso2.com/ - "Oxygenating the Web Service Platform." >> >>> >> >>> >> >>> >> >>> >> >>> --------------------------------------------------------------------- >> >>> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org >> >>> For additional commands, e-mail: axis-c-dev-help@ws.apache.org >> >>> >> >>> >> >> >> >> >> >> >> >> -- >> >> Thilina Gunarathne - http://thilinag.blogspot.com >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org >> >> For additional commands, e-mail: axis-c-dev-help@ws.apache.org >> >> >> >> >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org >> > For additional commands, e-mail: axis-c-dev-help@ws.apache.org >> > >> > >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org >> For additional commands, e-mail: axis-c-dev-help@ws.apache.org >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: axis-c-dev-unsubscribe@ws.apache.org For additional commands, e-mail: axis-c-dev-help@ws.apache.org