hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hardik Pandya <smarty.ju...@gmail.com>
Subject Re: Realtime sensor's tcpip data to hadoop
Date Fri, 09 May 2014 01:25:08 GMT
If I were you I would ask following questions to get the answer

> forget about for a minute and ask yourself how tcpip data are currently
being stored - in fs/rdbmbs?
> hadoop is for offiline batch processing - if you are looking for real
time streaming solution - there is a storm (from linkedin) that can go well
with kafka (messaging queue) or spark streaming (which is in memory
map-reduce) and takes real time streams - has in built twitter api but you
need to write your own service to poll data every few seconds and send it
in RDD format
> storm is complementary to hadoop - spark in conjuction with hadoop will
allow you to do both offline and real time data analytics

On Tue, May 6, 2014 at 10:48 PM, Alex Lee <eliyart@hotmail.com> wrote:

> Sensors' may send tcpip data to server. Each sensor may send tcpip data
> like a stream to the server, the quatity of the sensors and the data rate
> of the data is high.
> Firstly, how the data from tcpip can be put into hadoop. It need to do
> some process and store in hbase. Does it need through save to data files
> and put into hadoop or can be done in some direct ways from tcpip. Is there
> any software module can take care of this. Searched that Ganglia Nagios and
> Flume may do it. But when looking into details, ganglia and nagios are
> more for monitoring hadoop cluster itself. Flume is for log files.
> Secondly, if the total network traffic from sensors are over the limit of
> one lan port, how to share the loads, is there any component in hadoop to
> make this done automatically.
> Any suggestions, thanks.

View raw message