From Steve Loughran <ste...@apache.org>
Subject Re: web-based file transfer
Date Wed, 03 Nov 2010 11:35:21 GMT
On 02/11/10 18:25, Mark Laffoon wrote:
> We want to enable our web-based client (i.e. browser client, java applet,
> whatever?) to transfer files into a system backed by hdfs. The obvious
> simple solution is to do http file uploads, then copy the file to hdfs. I
> was wondering if there is a way to do it with an hdfs-enabled applet where
> the server gives the client the necessary hadoop configuration
> information, and the client applet pushes the data directly into hdfs.

I recall some work done with webdav
-but I don't think it's progressed

I've done things like this in the past with servlets and forms; the 
webapp you deploy has the hadoop configuration (and the network rights 
to talk to HDFS in the datacentre), the clients PUT/POST up content


However, you are limited to 2GB worth of upload/download in most web 
clients, some (chrome) go up to 4GB but you are pushing the limit there. 
Even all the Java servlet APIs assume that the content-length header 
fits into a signed 32 bit integer and gets unhappy once you go over 2GB 
(something I worry about in 
http://jira.smartfrog.org/jira/browse/SFOS-1476 )

Because Hadoop really likes large files -tens to hundreds of GB in a big 
cluster- I don't think the current web infrastructure is up to working 
with it.

that said, looking at hudson for the nightly runs of my bulk IO tests , 
jetty will serve up 4GB in 5 minutes (loopback if), and I can POST  or 
PUT up 4GB, but I have to get/set content length headers myself rather 
than rely on the java.net client and servlet implementations to handle it:


If you can control the client, then maybe you would be able to do >4GB 
uploads, but otherwise you are stuck with data <2GB in size, which is, 
-what- 4-8 blocks in a production cluster?


