hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Holsman <had...@holsman.net>
Subject Re: web-based file transfer
Date Wed, 03 Nov 2010 05:22:02 GMT
Doesn't chukwa do something like this?

---
Ian Holsman - 703 879-3128

I saw the angel in the marble and carved until I set him free -- Michelangelo

On 03/11/2010, at 5:44 AM, Eric Sammer <esammer@cloudera.com> wrote:

> I would recommend against clients pushing data directly to hdfs like
> this for a few reasons.
> 
> 1. The HDFS cluster would need to be directly exposed to a public
> network; you don't want to do this.
> 2. You'd be applying (presumably) a high concurrent load to HDFS which
> isn't its strong point.
> 
> From an architecture point of view, it's much nicer to have a queuing
> system between the upload and ingestion into HDFS that you can
> throttle and control, if necessary. This also allows you to isolate
> the cluster from the outside world. As to not bottleneck on a single
> writer, you can have uploaded files land in a queue and have multiple
> competing consumers popping files (or file names upon which to
> operate) out of the queue and handling the writing in parallel while
> being able to control the number of workers. If the initial upload is
> to a shared device like NFS, you can have writers live on multiple
> boxes and distribute the work.
> 
> Another option is to consider Flume, but only if you can deal with the
> fact that it effectively throws away the notion of files and treats
> their contents as individual events, etc.
> http://github.com/cloudera/flume.
> 
> Hope that helps.
> 
> On Tue, Nov 2, 2010 at 2:25 PM, Mark Laffoon
> <mlaffoon@semanticresearch.com> wrote:
>> We want to enable our web-based client (i.e. browser client, java applet,
>> whatever?) to transfer files into a system backed by hdfs. The obvious
>> simple solution is to do http file uploads, then copy the file to hdfs. I
>> was wondering if there is a way to do it with an hdfs-enabled applet where
>> the server gives the client the necessary hadoop configuration
>> information, and the client applet pushes the data directly into hdfs.
>> 
>> 
>> 
>> Has anybody done this or something similar? Can you give me a starting
>> point (I'm about to go wander through the hadoop CLI code to get ideas).
>> 
>> 
>> 
>> Thanks,
>> 
>> Mark
>> 
>> 
> 
> 
> 
> -- 
> Eric Sammer
> twitter: esammer
> data: www.cloudera.com

Mime
View raw message