Return-Path: Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: (qmail 63829 invoked from network); 3 Nov 2010 23:19:31 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 Nov 2010 23:19:31 -0000 Received: (qmail 51798 invoked by uid 500); 3 Nov 2010 23:20:01 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 51736 invoked by uid 500); 3 Nov 2010 23:20:00 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 51728 invoked by uid 99); 3 Nov 2010 23:20:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Nov 2010 23:20:00 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [195.232.224.71] (HELO mailout02.vodafone.com) (195.232.224.71) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Nov 2010 23:19:53 +0000 Received: from mailint02 (localhost [127.0.0.1]) by mailout02 (Postfix) with ESMTP id 84C8F214480 for ; Thu, 4 Nov 2010 00:19:32 +0100 (CET) Received: from avoexs02.internal.vodafone.com (unknown [145.230.4.135]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by mailint02 (Postfix) with ESMTPS id 791AE2144AB for ; Thu, 4 Nov 2010 00:19:32 +0100 (CET) Received: from VF-MBX13.internal.vodafone.com ([145.230.5.24]) by avoexs02.internal.vodafone.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 4 Nov 2010 00:19:33 +0100 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: web-based file transfer Date: Thu, 4 Nov 2010 00:19:32 +0100 Message-ID: <219D8244D980254ABF28AB469AD4E98F0346A152@VF-MBX13.internal.vodafone.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: web-based file transfer Thread-Index: Act7cP6h/AoLuUpGTa+NPpITyj0u4wAOt7oD References: <033e01cb7abb$623b7d50$26b277f0$@com> From: "Gibbon, Robert, VF-Group" To: , X-OriginalArrivalTime: 03 Nov 2010 23:19:33.0790 (UTC) FILETIME=[931CFBE0:01CB7BAD] X-Virus-Checked: Checked by ClamAV on apache.org Check out HDFS over WebDAV - http://www.hadoop.iponweb.net/Home/hdfs-over-webdav WebDAV is an HTTP based protocol for accessing remote filesystems. I'm running an adapted version of this. It runs under Jetty which is = pretty industry standard and is built on Apache JackRabbit which is = pretty production stable too. I lashed together a custom JAAS = authentication module to authenticate it against our user database. You can mount WebDAV on Linux using FUSE and WDFS, or script sessions = with cadaver on Solaris/Unix without mounting WebDav. It works pretty = sweet on Windows and Apple, too.=20 Recent versions of Jetty have built in traffic shaping and QoS features, = although you might get more mileage from HAProxy or a hardware = loadbalancer. It works pretty sweet as it enforces HDFS permissions (if you have them = enabled). To get Hadoop permission integrity enforced on MapReduce jobs = check out Oozie - it's a job submission proxy which runs under Tomcat = (might work with Jetty too - haven't tried) and can use a custom = ServletFilter for authentication which you can also patch onto your own = user database/directory.=20 Then you just need to seal the perimeter of your cluster with Firewall = rules and you're good to go No more Kerberos! R -----Original Message----- From: Eric Sammer [mailto:esammer@cloudera.com] Sent: Wed 11/3/2010 5:05 PM To: general@hadoop.apache.org Subject: Re: web-based file transfer =20 Something like it, but Chukwa is more similar to Flume. For *files* one may want something slightly different. For a stream of (data) events, Chukwa, Flume, or Scribe are appropriate. On Wed, Nov 3, 2010 at 1:22 AM, Ian Holsman wrote: > Doesn't chukwa do something like this? > > --- > Ian Holsman - 703 879-3128 > > I saw the angel in the marble and carved until I set him free -- = Michelangelo > > On 03/11/2010, at 5:44 AM, Eric Sammer wrote: > >> I would recommend against clients pushing data directly to hdfs like >> this for a few reasons. >> >> 1. The HDFS cluster would need to be directly exposed to a public >> network; you don't want to do this. >> 2. You'd be applying (presumably) a high concurrent load to HDFS = which >> isn't its strong point. >> >> From an architecture point of view, it's much nicer to have a queuing >> system between the upload and ingestion into HDFS that you can >> throttle and control, if necessary. This also allows you to isolate >> the cluster from the outside world. As to not bottleneck on a single >> writer, you can have uploaded files land in a queue and have multiple >> competing consumers popping files (or file names upon which to >> operate) out of the queue and handling the writing in parallel while >> being able to control the number of workers. If the initial upload is >> to a shared device like NFS, you can have writers live on multiple >> boxes and distribute the work. >> >> Another option is to consider Flume, but only if you can deal with = the >> fact that it effectively throws away the notion of files and treats >> their contents as individual events, etc. >> http://github.com/cloudera/flume. >> >> Hope that helps. >> >> On Tue, Nov 2, 2010 at 2:25 PM, Mark Laffoon >> wrote: >>> We want to enable our web-based client (i.e. browser client, java = applet, >>> whatever?) to transfer files into a system backed by hdfs. The = obvious >>> simple solution is to do http file uploads, then copy the file to = hdfs. I >>> was wondering if there is a way to do it with an hdfs-enabled applet = where >>> the server gives the client the necessary hadoop configuration >>> information, and the client applet pushes the data directly into = hdfs. >>> >>> >>> >>> Has anybody done this or something similar? Can you give me a = starting >>> point (I'm about to go wander through the hadoop CLI code to get = ideas). >>> >>> >>> >>> Thanks, >>> >>> Mark >>> >>> >> >> >> >> -- >> Eric Sammer >> twitter: esammer >> data: www.cloudera.com > --=20 Eric Sammer twitter: esammer data: www.cloudera.com