hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Sammer <esam...@cloudera.com>
Subject Re: web-based file transfer
Date Wed, 03 Nov 2010 16:05:19 GMT
Something like it, but Chukwa is more similar to Flume. For *files*
one may want something slightly different. For a stream of (data)
events, Chukwa, Flume, or Scribe are appropriate.

On Wed, Nov 3, 2010 at 1:22 AM, Ian Holsman <hadoop@holsman.net> wrote:
> Doesn't chukwa do something like this?
>
> ---
> Ian Holsman - 703 879-3128
>
> I saw the angel in the marble and carved until I set him free -- Michelangelo
>
> On 03/11/2010, at 5:44 AM, Eric Sammer <esammer@cloudera.com> wrote:
>
>> I would recommend against clients pushing data directly to hdfs like
>> this for a few reasons.
>>
>> 1. The HDFS cluster would need to be directly exposed to a public
>> network; you don't want to do this.
>> 2. You'd be applying (presumably) a high concurrent load to HDFS which
>> isn't its strong point.
>>
>> From an architecture point of view, it's much nicer to have a queuing
>> system between the upload and ingestion into HDFS that you can
>> throttle and control, if necessary. This also allows you to isolate
>> the cluster from the outside world. As to not bottleneck on a single
>> writer, you can have uploaded files land in a queue and have multiple
>> competing consumers popping files (or file names upon which to
>> operate) out of the queue and handling the writing in parallel while
>> being able to control the number of workers. If the initial upload is
>> to a shared device like NFS, you can have writers live on multiple
>> boxes and distribute the work.
>>
>> Another option is to consider Flume, but only if you can deal with the
>> fact that it effectively throws away the notion of files and treats
>> their contents as individual events, etc.
>> http://github.com/cloudera/flume.
>>
>> Hope that helps.
>>
>> On Tue, Nov 2, 2010 at 2:25 PM, Mark Laffoon
>> <mlaffoon@semanticresearch.com> wrote:
>>> We want to enable our web-based client (i.e. browser client, java applet,
>>> whatever?) to transfer files into a system backed by hdfs. The obvious
>>> simple solution is to do http file uploads, then copy the file to hdfs. I
>>> was wondering if there is a way to do it with an hdfs-enabled applet where
>>> the server gives the client the necessary hadoop configuration
>>> information, and the client applet pushes the data directly into hdfs.
>>>
>>>
>>>
>>> Has anybody done this or something similar? Can you give me a starting
>>> point (I'm about to go wander through the hadoop CLI code to get ideas).
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Mark
>>>
>>>
>>
>>
>>
>> --
>> Eric Sammer
>> twitter: esammer
>> data: www.cloudera.com
>



-- 
Eric Sammer
twitter: esammer
data: www.cloudera.com

Mime
View raw message