hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gibbon, Robert, VF-Group" <Robert.Gib...@vodafone.com>
Subject RE: web-based file transfer
Date Tue, 09 Nov 2010 16:48:07 GMT
>Even all the Java servlet APIs assume that the content-length header 
>fits into a signed 32 bit integer and gets unhappy once you go over 2GB 
>(something I worry about in
>http://jira.smartfrog.org/jira/browse/SFOS-1476 )

I built my HDFS webDAV implementation to reference the JackRabbit 1.6.4 library - AFAIK it
uses a long for the content-length header since release 1.5.5, not a 32bit int:

https://issues.apache.org/jira/browse/JCR-2009

That means any large file limitations are going to be on the client side, especially for 32bit
OSs, so yes, might be worth thinking about leveraging HAR archives to keep the file count
down if you do choose to go down the same route.

R

On 02/11/10 18:25, Mark Laffoon wrote:
> We want to enable our web-based client (i.e. browser client, java applet,
> whatever?) to transfer files into a system backed by hdfs. The obvious
> simple solution is to do http file uploads, then copy the file to hdfs. I
> was wondering if there is a way to do it with an hdfs-enabled applet where
> the server gives the client the necessary hadoop configuration
> information, and the client applet pushes the data directly into hdfs.


I recall some work done with webdav
   https://issues.apache.org/jira/browse/HDFS-225
-but I don't think it's progressed

I've done things like this in the past with servlets and forms; the 
webapp you deploy has the hadoop configuration (and the network rights 
to talk to HDFS in the datacentre), the clients PUT/POST up content

http://www.slideshare.net/steve_l/long-haul-hadoop

However, you are limited to 2GB worth of upload/download in most web 
clients, some (chrome) go up to 4GB but you are pushing the limit there. 
Even all the Java servlet APIs assume that the content-length header 
fits into a signed 32 bit integer and gets unhappy once you go over 2GB 
(something I worry about in 
http://jira.smartfrog.org/jira/browse/SFOS-1476 )

Because Hadoop really likes large files -tens to hundreds of GB in a big 
cluster- I don't think the current web infrastructure is up to working 
with it.


that said, looking at hudson for the nightly runs of my bulk IO tests , 
jetty will serve up 4GB in 5 minutes (loopback if), and I can POST  or 
PUT up 4GB, but I have to get/set content length headers myself rather 
than rely on the java.net client and servlet implementations to handle it:

http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/components/www/src/org/smartfrog/services/www/bulkio/client/SunJavaBulkIOClient.java?revision=8430&view=markup

If you can control the client, then maybe you would be able to do >4GB 
uploads, but otherwise you are stuck with data <2GB in size, which is, 
-what- 4-8 blocks in a production cluster?

-steve


Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message