Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: general@hadoop.apache.org
Received-SPF: pass (nike.apache.org: local policy)
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Subject: RE: web-based file transfer
Date: Thu, 4 Nov 2010 00:19:32 +0100
Message-ID: 
 <219D8244D980254ABF28AB469AD4E98F0346A152@VF-MBX13.internal.vodafone.com>
Thread-Topic: web-based file transfer
Thread-Index: Act7cP6h/AoLuUpGTa+NPpITyj0u4wAOt7oD
References: 
 <033e01cb7abb$623b7d50$26b277f0$@com><AANLkTi=BPo3g3zVD2P3qjW_65sh-O6js8mrGup-E4oPf@mail.gmail.com><E3F1EA91-FEB2-4C33-B00E-122D6F1EF1CF@holsman.net>
 <AANLkTin-PkJvFy5R-wXwMzcQ50xZVkgbGKRU_6UA0GXg@mail.gmail.com>
From: "Gibbon, Robert, VF-Group" <Robert.Gibbon@vodafone.com>
To: <general@hadoop.apache.org>,
	<general@hadoop.apache.org>

Check out HDFS over WebDAV

- http://www.hadoop.iponweb.net/Home/hdfs-over-webdav

WebDAV is an HTTP based protocol for accessing remote filesystems.

I'm running an adapted version of this. It runs under Jetty which is =
pretty industry standard and is built on Apache JackRabbit which is =
pretty production stable too. I lashed together a custom JAAS =
authentication module to authenticate it against our user database.

You can mount WebDAV on Linux using FUSE and WDFS, or script sessions =
with cadaver on Solaris/Unix without mounting WebDav. It works pretty =
sweet on Windows and Apple, too.=20

Recent versions of Jetty have built in traffic shaping and QoS features, =
although you might get more mileage from HAProxy or a hardware =
loadbalancer.

It works pretty sweet as it enforces HDFS permissions (if you have them =
enabled). To get Hadoop permission integrity enforced on MapReduce jobs =
check out Oozie - it's a job submission proxy which runs under Tomcat =
(might work with Jetty too - haven't tried) and can use a custom =
ServletFilter for authentication which you can also patch onto your own =
user database/directory.=20

Then you just need to seal the perimeter of your cluster with Firewall =
rules and you're good to go

No more Kerberos!
R

-----Original Message-----
From: Eric Sammer [mailto:esammer@cloudera.com]
Sent: Wed 11/3/2010 5:05 PM
To: general@hadoop.apache.org
Subject: Re: web-based file transfer
=20
Something like it, but Chukwa is more similar to Flume. For *files*
one may want something slightly different. For a stream of (data)
events, Chukwa, Flume, or Scribe are appropriate.

On Wed, Nov 3, 2010 at 1:22 AM, Ian Holsman <hadoop@holsman.net> wrote:
> Doesn't chukwa do something like this?
>
> ---
> Ian Holsman - 703 879-3128
>
> I saw the angel in the marble and carved until I set him free -- =
Michelangelo
>
> On 03/11/2010, at 5:44 AM, Eric Sammer <esammer@cloudera.com> wrote:
>
>> I would recommend against clients pushing data directly to hdfs like
>> this for a few reasons.
>>
>> 1. The HDFS cluster would need to be directly exposed to a public
>> network; you don't want to do this.
>> 2. You'd be applying (presumably) a high concurrent load to HDFS =
which
>> isn't its strong point.
>>
>> From an architecture point of view, it's much nicer to have a queuing
>> system between the upload and ingestion into HDFS that you can
>> throttle and control, if necessary. This also allows you to isolate
>> the cluster from the outside world. As to not bottleneck on a single
>> writer, you can have uploaded files land in a queue and have multiple
>> competing consumers popping files (or file names upon which to
>> operate) out of the queue and handling the writing in parallel while
>> being able to control the number of workers. If the initial upload is
>> to a shared device like NFS, you can have writers live on multiple
>> boxes and distribute the work.
>>
>> Another option is to consider Flume, but only if you can deal with =
the
>> fact that it effectively throws away the notion of files and treats
>> their contents as individual events, etc.
>> http://github.com/cloudera/flume.
>>
>> Hope that helps.
>>
>> On Tue, Nov 2, 2010 at 2:25 PM, Mark Laffoon
>> <mlaffoon@semanticresearch.com> wrote:
>>> We want to enable our web-based client (i.e. browser client, java =
applet,
>>> whatever?) to transfer files into a system backed by hdfs. The =
obvious
>>> simple solution is to do http file uploads, then copy the file to =
hdfs. I
>>> was wondering if there is a way to do it with an hdfs-enabled applet =
where
>>> the server gives the client the necessary hadoop configuration
>>> information, and the client applet pushes the data directly into =
hdfs.
>>>
>>>
>>>
>>> Has anybody done this or something similar? Can you give me a =
starting
>>> point (I'm about to go wander through the hadoop CLI code to get =
ideas).
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Mark
>>>
>>>
>>
>>
>>
>> --
>> Eric Sammer
>> twitter: esammer
>> data: www.cloudera.com
>


--=20
Eric Sammer
twitter: esammer
data: www.cloudera.com