hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From samir das mohapatra <samir.help...@gmail.com>
Subject How to solve one Scenario in hadoop ?
Date Tue, 05 Mar 2013 05:27:57 GMT
Hi All,
   I have  one scenario  where our organization is trying to implement
hadoop.

Scenario Statement:

---------------------------------------

    Supoose  we have variouse data sources , for example RDBMS, HDFS,
Streaming .


*Source Dataset Types :*

 1.Single Source

2.Joining Sources

3.Filtered Data set

4.Specific columns


We nee to pull the data from one source to other , it could be from HDFS to
RDBMS or vice versa based on condition , that means out of whole data from
source  we need only the specific data,whole data,join data  into the
destination . So which direction we should go to pull the data based on the
above dataset type condition.


I am thinking .

 CASE-1   DATA  from HDFS to HDFS (different cluster) whole data
           :-  we will use *distcp  *

CASE-2    DATA  from HDFS to HDFS (different cluster) conditional data
(Filter data) :-  we will use  *CUSTOM MAP REDUCE PROGRAM Where we will do
the filter operation then load*

CASE-3    DATA from HDFS to RDBMS(Whole data): *SQOOP*

CASE-4   DATA from HDFS to RDBMS(conditional data): *SQOOP*

CASE-5   SOME DATA  FROM RDBMS and SOME DATA FROM HDFS then do filter and
load into HDFS : *JDBC WITH Map/Reduce program*


Note: Can any one suggest me, if I am wrong and we need to do something
other then this, which will be easy to do .


Regards,

samir.

Mime
View raw message