hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russell Jurney <russell.jur...@gmail.com>
Subject Re: streaming data ingest into HDFS
Date Fri, 16 Dec 2011 01:05:55 GMT
Just curious - what is the situation you're in where no collectors are
possible?  Sounds interesting.

Russell Jurney

On Dec 15, 2011, at 5:01 PM, "Periya.Data" <periya.data@gmail.com> wrote:

> Hi all,
>     I would like to know what options I have to ingest terabytes of data
> that are being generated very fast from a small set of sources. I have
> thought about :
>   1. Flume
>   2. Have an intermediate staging server(s) where you can offload data and
>   from there use dfs -put to load into HDFS.
>   3. Anything else??
> Suppose I am unable to use Flume (since the sources do not support their
> installation) and suppose that I do not have the luxury of having an
> intermediate staging place, what options do I have? In this case, I might
> have to directly (preferably in parallel) ingest data into HDFS.
> I have read about a technique to use Map-Reduce where the map would read
> data and use JAVA API to store in HDFS. We could have multiple threads of
> maps to get parallel ingestion. It would be nice to know about ways to
> ingest data "directly" into HDFS considering my assumptions.
> Suggestions are appreciated,
> /PD.

View raw message