airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Schwartz, Terri" <te...@sdsc.edu>
Subject RE: [#Spring17-Airavata-Courses] Reliable file uploads for Science Gateway Portals
Date Mon, 08 May 2017 15:58:53 GMT
I made a java servlet implementation of the server side of the tus protocol: https://github.com/terrischwartz/tus_servlet

I did this for ease of integration with the cipres portal.  In particular, I wanted users
who were logged into the portal to be able to upload via the tus protocol, and didn't want
to have to use anything other than the existing login session for authentication.    I experimented
with using it for upload from cipres but we don't have a production ready UI in place yet.

I wanted to extend it to function as a standalone server that could be used by multiple applications
but never worked out the authentication for that.

I looked at the same javascript libraries you did before coming across tus.io, and wasn't
too impressed with them.  The tus.io core developers are very enthusiastic, friendly and professional.
 They know what they're doing and are great to work with.   As you've probably seen, the protocol
is super simple and does the trick.

Terri
________________________________
From: Miller, Mark [mmiller@sdsc.edu]
Sent: Monday, May 08, 2017 6:08 AM
To: dev@airavata.apache.org
Subject: RE: [#Spring17-Airavata-Courses] Reliable file uploads for Science Gateway Portals

Hi Ameya,

We have thought about this at CIPRES as well.
Terri Schwartz in our group has implemented a Java version of resumable file transfer in tus.io,
and I have mentioned this as a possible SciGaP/Airavata service. I would talk recommend you
talk with Terri and see if there is synergy between your thinking, and what she has done.

Mark

From: Ameya Advankar [mailto:aadvanka@umail.iu.edu]
Sent: Sunday, May 07, 2017 9:23 PM
To: dev@airavata.apache.org
Subject: [#Spring17-Airavata-Courses] Reliable file uploads for Science Gateway Portals

Hi Airavata Developers,

I have been exploring and evaluating tus.io<http://tus.io/>  as a solution to the following
problems encountered in Science Gateway Portals related to the file upload functionality:

1. Unreliable HTTP Connection

Since the File uploads in Science Gateway Portals are HTTP uploads, these are heavily reliant
on a continuous internet connection being available on the client machine. There could be
network disruptions or connectivity issues and a traditional file upload will fail in this
case. As a result, the users may have to retry the uploads manually and wait for a successful
upload to take place. If the files are large i.e. a few hundred Megabytes or some Gigabytes,
this will cause a waste of bandwidth and time.

2. Space constraints on the Server

The file which is being uploaded usually would be staged somewhere on the Server for a certain
period of time till it is picked up for further processing. The file may be there for a considerable
amount of time depending on the process queuing time. In case multiple large files are uploaded
at the same time by users of a Portal, the host machine may run out of space and this could
have adverse affects on the performance of the Portal.

Also, there could be cases in which multiple Science Gateway Portals are hosted on a the same
web server. In such a setting, if a particular Portal fills up server space with multiple
large files, it may affect the performance of other Portals residing on that web server as
well. In short, the file-upload functionality should not affect the Portal performance.


We could use some JavaScript libraries such as Fine Uploader<https://fineuploader.com/>,
Resumable.js<http://www.resumablejs.com>  or flow.js<https://github.com/flowjs/flow.js>
which provide a simple client side library for resumable file uploads in case of disruptions.
Fine Uploader seems to be the best among these libraries based on the community usage and
contribution on Github. For each of these client side libraries, we have to incorporate the
corresponding server side code to handle resumable uploads.

However since their JavaScript implementations are unique, with each library using their own
set of parameters and request headers to achieve Resumable functionality, the Server implementation
which we adopt will be tightly coupled with the library we choose. This will introduce a dependency
on the library.
To remove the library based dependency, we can use a client-server implementation of tus.io<http://tus.io>
protocol. Using a protocol will reduce the library level dependency to a protocol level dependency.

Also since tus.io<http://tus.io> client could be implemented in any language, we could
have multiple types of Gateway Portals such as Web, Desktop and native which connect to the
same tus.io<http://tus.io> based server.

The Second problem of space constraint can be solved by separating the file-upload process
as a micro service located on a separate host. Each Portal could have its own separate micro-service
instance and this way the file-upload functionality will not hamper the Portal performance.
Further, the micro-service will have to be secured and tus.io<http://tus.io> allows
us to do this via the tus.io hooks<https://github.com/tus/tusd/blob/master/docs/hooks.md>
feature by implementing the Auth code in the pre-create hook.

Thus, tus.io<http://tus.io> seems to be a good, flexible and maintainable solution for
implementing file-upload functionality in Science Gateway Portals from a long term perspective.

Thanks & Regards,
Ameya Advankar
Masters in Computer Science,
Indiana University Bloomington

Mime
View raw message