tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fastupload <fastupl...@outlook.com>
Subject Re: to improve the performance of form-based upload for Tomcat 7
Date Wed, 26 Sep 2012 06:57:07 GMT
Chris,


here are my brief opinions.

> Committers are invited by the current group of active participants. The
> best way to be invited is to become active in the community (i.e. this
> mailing list and/or the users@tomcat.apache.org mailing list), and
> submit patches.
> 
thanks for providing the right info.


> If you have a specific patch you think would be useful, file an
> enhancement request in Bugzilla and attach your patch to it. If it's
> useful, someone will apply it and give you credit.
> 
> I'm interested in how you are able to obtain a "5x speed improvement
> over commons file-upload": the slowest link in the chain is the network
> which you can't fix with software (other than compression). I'm unclear
> as to why you think Boyer Moore string searching will be measurably
> faster than simple String.indexOf because the search strings (the
> multipart boundaries, usually only about 64 bytes) are so small.
> 
why BoyerMoore algorithm is faster then simple String.indexOf search, you can reference the
wiki page, 
http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

in fastupload, the architecture is more simple than commons file upload. also, fast upload
requires java 5 or high version. so commons file upload cannot be fixed with the same way.

in fact, BoyerMoore string search algorithm is open. I research the algorithm and found that
it is the right algorithm to find a random character in text. 
A un-titled author write the java implementation of it in Wiki. I made a bit enhancement of
the implementation, to enable it has the ability search  content in java bytes.  the source
code named "BoyerMoore.java"  in the fast upload project to give the copyright to Boyer and
Moore.

where ever, BoyerMoore can search any java bytes.  Reading whole bytes of ServletInputStream
buffer is not required.  In the case, reading some bytes from ServletInputStream and find
boundary  from the bytes, it did good jobs well. if you're interested it. please reference
the source code StreamUploaderParser.java in fast upload source. 

compare with commons file upload and Cosz upload component, only fast upload component provides
the resolution that parse a part data of Multipart data represent a uploading file. and write
the data into a file.  the resolution can reduce the memory cost when parsing a large size
of file.

> Also, I think the use of Boyer Moore is naïve, as it will require you to
> read a whole multipart part into memory before searching for the
> boundary and disassembling the parts.
> 
> Finally, you ignore an opportunity to further improve your algorithm
> because the multipart boundary does not change from part to part: you
> can cache the charset and offset tables for the multipart boundary for
> the entire request instead of re-creating them each time you search.
Exactly! since fast upload 0.3.5 release, the plan includes the enhancement. 

> But
> then you'd have to understand the algorithm instead of just copy/pasting
> from Wikipedia. At least change some of the Javadoc formatting if you
> are going to steal other people's work. Otherwise, give them credit.





On Sep 25, 2012, at 11:40 PM, Christopher Schultz <chris@christopherschultz.net> wrote:

> Link,
> 
> On 9/25/12 10:14 AM, Fastupload wrote:
>> What's the right  org  that I can apply a commuter account of apache
>> open source project?
> 
> Committers are invited by the current group of active participants. The
> best way to be invited is to become active in the community (i.e. this
> mailing list and/or the users@tomcat.apache.org mailing list), and
> submit patches.
> 
> If you have a specific patch you think would be useful, file an
> enhancement request in Bugzilla and attach your patch to it. If it's
> useful, someone will apply it and give you credit.
> 
> I'm interested in how you are able to obtain a "5x speed improvement
> over commons file-upload": the slowest link in the chain is the network
> which you can't fix with software (other than compression). I'm unclear
> as to why you think Boyer Moore string searching will be measurably
> faster than simple String.indexOf because the search strings (the
> multipart boundaries, usually only about 64 bytes) are so small.
> 
> Also, I think the use of Boyer Moore is naïve, as it will require you to
> read a whole multipart part into memory before searching for the
> boundary and disassembling the parts.
> 
> Finally, you ignore an opportunity to further improve your algorithm
> because the multipart boundary does not change from part to part: you
> can cache the charset and offset tables for the multipart boundary for
> the entire request instead of re-creating them each time you search. But
> then you'd have to understand the algorithm instead of just copy/pasting
> from Wikipedia. At least change some of the Javadoc formatting if you
> are going to steal other people's work. Otherwise, give them credit.
> 
> -chris
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message