hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Periya.Data" <periya.d...@gmail.com>
Subject streaming data ingest into HDFS
Date Fri, 16 Dec 2011 01:00:59 GMT
Hi all,
     I would like to know what options I have to ingest terabytes of data
that are being generated very fast from a small set of sources. I have
thought about :

   1. Flume
   2. Have an intermediate staging server(s) where you can offload data and
   from there use dfs -put to load into HDFS.
   3. Anything else??

Suppose I am unable to use Flume (since the sources do not support their
installation) and suppose that I do not have the luxury of having an
intermediate staging place, what options do I have? In this case, I might
have to directly (preferably in parallel) ingest data into HDFS.

I have read about a technique to use Map-Reduce where the map would read
data and use JAVA API to store in HDFS. We could have multiple threads of
maps to get parallel ingestion. It would be nice to know about ways to
ingest data "directly" into HDFS considering my assumptions.

Suggestions are appreciated,


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message